—— a 
Ssaen we 
ase = 
eae @ 
Sea 

ee @ 
—— a 
7 : [= 
: = aay 
—— =) 


NOT TO BE TAKEN FROM THIS ROOM 


ribincne ae 


ria e Lelaleotes asain) aif aak needa 


Gx angais 
UNITASUTAIS 


Digitized by the Internet Archive 
in 2023 with funding from 
University of Alberta Library 


httos://archive.org/details/Scissons1976 


THE UNIVERSITY OF ALBERTA 


RELEASE FORM 


NAME OF AUTHOR Edward H. Scissons 

TITLE OF THESIS Convergence of Clinical Judgement: A Multitrait 
Analysis 

DEGREE FOR WHICH THESIS WAS PRESENTED Doctor of Philosophy 


YEAR THIS DEGREE GRANTED 1976 

Permission is hereby granted to THE UNIVERSITY OF ALBERTA 
LIBRARY to reproduce single copies of this thesis and to lend or sell 
such copies for private, scholarly, or scientific research purposes 
only. 

The author reserves other publication rights, and neither the 
thesis nor extensive extracts from it may be printed or otherwise 


reproduced without the author's written permission. | 


> 


' i 
‘ 2 4) v 
ae RT sen ar (cru 


THE UNIVERSITY OF ALBERTA 


CONVERGENCE OF CLINICAL JUDGEMENT: A MULTITRAIT ANALYSIS 


by 
EDWARD H. SCISSONS 


A THESIS 
SUBMITTED TO THE FACULTY OF GRADUATE STUDIES AND RESEARCH 
IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE 


OF DOCTOR OF PHILOSOPHY 


DEPARTMENT OF EDUCATIONAL PSYCHOLOGY 


EDMONTON, ALBERTA 


FALL, 1976 


QleaHtT A 
HOqAaaA CWA C2TGUT2 STAUGARD 70 YTIUDAT 2HT OT GATTIMAvE 
agaogd HT 101 QIMaMaRIUOd® ANT JO THIMITIUUT JAITAAG MT 


' 
iy ie 


YAIO@OITHT 20 HOTIOM 10 


4 


YOOIONOYST TJAMOITAQUAG TO THSMTRAGAG 


aTazaaa .MoTMOMEa = J 
ave , dia 
7 ! , 
os | a 
( j 
“ : 1 a 7 a mn 


C4 “f . | i hie. 


THE UNIVERSITY OF ALBERTA 


FACULTY OF GRADUATE STUDIES AND RESEARCH 


The undersigned certify that they have read, and recommend 
to the Faculty of Graduate Studies and Research, for acceptance, a 
thesis entitled Convergence of Clinical Judgement : A Multitrait 
Analysis submitted by Edward H. Scissons in partial fulfilment of 


the requirements for the degree of Doctor of Philosophy. 


} : ° ae . 
: : _ 
, 7 is, yr , . oe 


ABSTRACT 

This research was a study of the convergence of clinical 
judgement (multitrait ratings) across three different information 
sources (psychometric tests, interview, and test + interview). Of 
major interest was the similarity of clinical evaluations of ability 
across the three different information sources. 

Subjects (N=74) were executives appraised by a firm of industrial 
psychologists. Subjects were evaluated independently on 18 different 
traits on the basis of: test information alone, interview information 
alone, or test and interview information combined. 

Results indicate a varying degree of convergence of clinical ratings 
dependent on clinician and trait. A clinician by factor by rating 
eondi een model of executive assessment is souoneLe cea Convergence 
indices ranged from a high of .64 to a low of .05. The nature of 
reliability theory, as it pertains to clinical judgement research, 
is discussed and suggestions for further research in the area are 


presented. 
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CHAPTER I 
STATEMENT OF THE PROBLEM 

"Progress in psychological assessment is important not only to 
such applied fields as clinical, counselling, educational, and 
industrial psychology, but is vital also to the continued development 
of psychology as-a-whole (McReynolds, 1968, p. 1)". Modern 
psychology is directly concerned with understanding the human con- 
dition; research in psychology has traditionally been oriented 
towards the development and evaluation of better assessment techniques 
and procedures (McReynolds, 1968). 

The "clinical judgement debate", as it has been called, 
developed as a vigorous movement in psychology in the early 1950's. 
Of concern were problems such as the ability of psychologists to 
predict future behavior, the validity and reliability of prediction, 
and clinical versus actuarial methods °f prediction. Early 
writings, such as that of Meehl (1954), did much to spark debate 
between the psychodiagnosticians on one hand and the actuarially 
oriented researcher or clinician on the other. Of most importance 
was the accuracy (in all senses of the word) of assessment decisions 
based on either clinical or actuarial integration of client infor- 
mation. The controversy is far from over, but research of late 
has concentrated more on improving both methods of prediction or 
decision making rather than fanning the fires of difference that 
exist between the two (Goldberg, 1970). 

Managers, administrators, and other executives play an impor- 


tant role in modern society and are always in short supply 
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(Dunnette, 1971). This is not to imply. that managers and other 
professionals are in short supply but rather that good managers, 
good administrators, and good executives remain a scarce commodity 
in the occupational marketplace. 

Industrial psychologists and other professionals concerned 
with what makes a good executive and how to identify a good 
executive by methods other than trial and error, are involved ina 
specialized aspect of the clinical judgement dilemma. Research 
here has focused on studies concerned with the predictive validity 
of executive assessments and studies which investigated the assess- 
ment process itself from the points of view of validity and 
reliability. Thus we have studies such as that by Bray & Grant (1966) 
that investigate the specific contribution of the interview to 
over-all executive assessment and studies such as that by Wollowick 
& McNamara (1969) which look at the components of an executive 
assessment program. Other research, not specifically dealing with 
clinical judgement, has been concerned with the interview as a 
diagnostic technique (Webster, 1964; Grant & Bray, 1969; Ulrich & 
Trumbo, 1965; Mayfield, 1964), testing as an adjunctive or sole 
means of executive assessment (Henrichs, 1969; Spitzer & McNamara, 
1964; Bray & Moses, 1972), or various multiple assessment techniques 
(Albrecht, Glaser & Marks, 1964; Wollowick & McNamara, 1969; 
Campbell, Otis, Liske & Prien, 1962). 

The relationship of clinical judgement to executive appraisal 


is a logical one. Clinical judgement is concerned with assessment; 
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those dealing with executive appraisal are also concerned with 
assessment at a very operational level. Although most research in 
the area of clinical judgement has been concerned with unidimen- 
sional decision making, e.g., the diagnosis of psychotic versus 
neurotic from MMPI profiles (Goldberg, 1965), some researchers 
(Goldberg & Werts, 1966; Donaldson, 1969) have addressed themselves 
to a more complex multitrait multimethod approach (Campbell & 
Fiske, 1959). There is, however, little research which relates 
these clinical judgement findings from clinical psychology to the 
multitrait multimethod domain of executive appraisal. There has 
been virtually no work, other than that concerned with assessment 
centers, which relates this multitrait multimethod model to 
executive appraisal in a natural setting. It ie the use of this 
natural setting which is most likely to result in research findings 
high in peor ee a result of high ecological (external) 
validity (Snow, 1974). 

Even within the domain of clinical judgement in clinical 
psychology, most research has focused on prediction accuracy, 
stability, or concensus rather than convergence as measures of 
judgement effectiveness. Convergence in clinical judgement is 
important because it yields a measure of the degree of similarity 
in the assessments a clinician makes with respect to his clients 
as a result of the different types of data available about these 


clients, e.g., test versus nontest data (Goldberg & Werts, 1966). 
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This study is an investigation of the convergence of clinical 
judgement in executive appraisal. The hypothesis tested is that 
there will be a significant difference in the assessment of a client 
by a clinician depending on the type of information available 
about that client. Of specific interest in this study are the 
differences in appraisal (multitrait ratings) as a result of 
information obtained by (a) interview alone, (b) testing alone, or 
(c) testing + interview combined. 

This study is of considerable importance at both a theore- 
tical and an operational level. At a theoretical level, rationale 
for the study focus on the generalizability of clinical findings 
across data bases, nature of the interaction between trait, information 
base, and clinical judgement, particularly as these affect multi- 
trait analysis of ability, and the providing of an empirical base 
for further predictive validity studies once the problem of con- 
vergence has been accounted for. At present, there exists no 
research to provide a rationale for the generalizing of clinical 
judgement findings across data bases; there appears to be an unmet 
assumption of high convergence. 

At an operational level, this study is important because it 
is concerned with the possible duplication of psychological services. 
If high convergence is evident on several or all of the traits 
involved in this multitrait analysis, cost alone should dictate a 
judicious duplication of services through multimethod assessment 


techniques. 
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CHAPTER II 
REVIEW OF THE LITERATURE 

"To many people, the prediction problem must seem to be the 
basic problem of applied psychology (Gough, 1962, p. 526)". Studies 
of clinical judgement, which are only one aspect of the 'prediction 
problem' discussed at length by Gough (1962), have progressed 
through a number of rather distinct stages if viewed in a historical 
perspective (Bieri, Atkins, Briai, Leaman, Miller, & Tripodi, 1966). 
Research has developed from its roots in introspective analysis 
(Erickson, 1959) to studies of the validity and reliability of 
clinical judgement, clinical versus statistical prediction 
(Meehl, 1954), and on to the most recent stage which is concerned 
with models of decision making within the framework of decision 
theory. In many ways, studies concerned with the validity/ 
reliability of clinical judgement and those concerned with actuarial 
versus clinical predictive validity are similar. Both are concerned 
with improving and/or describing the decision making process 
directly, i.e., in terms of outcomes. The last stage, model building, 
has been an attempt to develop theoretical models of decision 
making or information processing as an indirect attempt to improve 
future decisions (Bieri et al, 1966) rather than to & priori 
evaluate present ones. 

This literature review will examine clinical judgement from 
three perspectives: (1) models of clinical judgement, (2) reliability 


and validity of clinical judgement including the actuarial versus 
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clinical dilemma, and (3) clinical judgement in executive appraisal. 
Models of Clinical Judgement 

Since the early 1960's the focus on clinical judgement 
research has been concerned with the nature of the clinical judgement 
decision making process itself. Of major concern has been the 
development of mathematical models to either explain or improve on 
the actual judgement of the clinician. 

Goldberg (1971) isolates two general models for clinical 
decision making; linear and non-linear. The linear model is that 
model expressed by a multiple regression analysis and is equivalent 
to the formulation of regression weights in order to combine 
accurately available information for purposes of prediction. Non- 
linear models usually involve some type of moderator variable 
‘effect; i.e., the weighting of one variable will vary in relation 
to the magnitude of the difference between two or more other 
variables. A number of different types of non-linear models have 
been postulated (Goldberg, 1971; Einhorn, 1970, 1971) all involving 
some form of moderator variable combination. 

Wiggins & Hoffman (1968) outline an important study which 
examines the relative efficacy of three different models of infor- 
mation combination; the linear, quadratic, and sign models. The 
quadratic model is similar to the linear model already described 
but includes the squares and products of the original linear model. 
The sign model incorporates a linear combination of 70 clinical 


signs in relation to MMPI interpretation first described by 
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Goldberg (1965). Their experiment involved an experimental design 
now classic in.clinical judgement research. Psychologists were 
required to rate MMPI profiles as psychotic or neurotic in a blind 
rating fashion. Results indicated the presence of both linear and 
configural processing of information by clinicians dependent on 
both clinician and subject samples. Clinicians obtained results 
which were similar to computer integration of information as per 
the three models just described. However, as noted by the authors, 
the differences between the results obtained by any of the three 
methods of information combination were not great. The simple 
linear model combined data in a very efficacious manner. 

Goldberg's (1965) study is further supported by Dawes (1972) 
and Dawes & Corrigan (1974) who describe two different types of 
linear models in an experiment. designed to test the ability of 
human judges to perform against even random linear models. The 
two models, actuarial (based on a regression of the criterion in 
the predictors) and bootstrapping (based on a regression of the 
judges' prediction on the predictors) were both superior to the 
decisions of human judges even when regression weights were assigned 
randomly rather than systematically. Experiments cited by the 
two authors involved the rating of psychotic versus neurotic on the 
MMPI, prediction of graduate school success, and geometric design 
estimation. Dawes (1972) summarizes his findings: "If a reasonable 
sample of cases exists for which the output values are known, the 


best way to make the predictions is to estimate beta weights for 
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the input variables on the basis of multiple regression; human 
judges should be ignored (p. 3)". 

Wainer (1976) further reinforces this finding. He indicates 
that, in very general circumstances, little is lost in terms of the 
original data if regression coefficients are estimated rather than 
calculated. 

Configural processing, best described as a usage of moderator 
variables either overtly or covertly, has also commanded considerable 
attention in the clinical judgement literature. Hoffman, Slovic, & 
Rorer (1968) utilized an ANOVA technique to assess configural 
precessing in the diagnosis of malignant gastric ulcers using nine 
radiologists as clinicians. Although the authors were able to 
demonstrate conclusively the reality of configural processing, they 
further indicate that even when this processing was utilized by the 
clinicians, clinician decision accuracy did not match even that of a 
Simple linear combinative model. 

Einhorn (1972), in an important study involving the clinical 
judgement of malignant cancers, addresses himself to the efficacy of 
combining components of the decision making process rather than the 
binary decisions involved in many of the classic studies in the 
area. He suggests the use of expert clinicians, in their specific 
areas of expertise, and combining these 'mini-decisions' mechanically. 

Shinedling, Howell, & Carlson (1975) combine both clinical 
‘rule of thumb' techniques with statistics to produce a 'clinistics' 


model of clinical judgement. They conclude that, "rather than 
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trying to justify the utility of personal, private judgement, 
psychologists should study the contribution of objective clinical 
decision-making strategies. Studying 'clinistics' might lead to 
new insights and understandings about behavior (p. 389)". 

Goldberg (1970) may be getting much closer to the truth when 
he describes his very important study which once again utilizes 
the clinical task of distinguishing psychotic versus neurotic MMPI 
profiles. He concludes that the model that the clinicians actually 
used, when applied systematically and consistently, yielded better 
decisions than did the actual clinicians. The problem with clinicians 
he argues, may not be that they are wrong, but that they are incon- 
sistent (or human!). 

Slovic, Rorer, & Hoffman (1971) carry Goldberg's (1971) 
research one step further. They investigated the reasons why 
clinicians diagnose differentially. In a study involving the 
diagnosis of gastric ulcer malignancy, they attempted to discover 
how each clinican used the various clinical signs available to him. 
Their research enables them to trace differential diagnosis back 
to a differential use of clinical signs. They cite that the major 
use of their method is in the opportunity afforded in the 'train-to- 
model' teaching of student clinicians. 

The Validity and Reliability of Clinical Judgement 

Outcome studies in the area of clinical judgement have focused. 

most directly onthe predictive validity of clinical decision making 


be those decisions made clinically or actuarially. 
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Meehl (1954), in his now classic book, Clinical versus 
Statistical Prediction, analyzed previously published studies dealing 
with the validity and reliability (consistency and stability) of 
clinical and actuarial decisions. He summarizes his findings: 

In spite of the defects and ambiguities present, let 

me emphasize the brute fact that we have here, 

depending upon one's standards for admission as 

relevant, from 16 to 20 studies involving a com- 

parison of clinical and actuarial methods, in all 

but one of which the predictions made actuarially 

were either approximately equal or superior to 

those made by the clinician. (p. 119) 

Although attempting to maintain a balanced perspective in 
analyzing the clinical versus actuarial dilemma, Meehl (1954) finds 
himself unavoidably drawn to the side of the actuary. The clinician 
cannot predict at a level that would rival even the most simple 
linear regression equation. Meehl (1954) has been taken to task by 
several other writers because of his handling of the clinical versus 
actuarial problem. 

Holt (1958) rejects as artificial the dichotomy employed by 
Meehl (1954) of clinicians on one hand and actuaries on the other. 
He indicates that clinical judgement must enter the actuarial 
process at frequent intervals. The actuary must still select his 
tests, criterion measures, intervening variables, and psychological 
constructs. How then, Holt (1958) argues, can we even talk of such 
a false distinction. Both are merely forms of clinical integration. 


In a later treatise, Holt (1970) reaffirms his argument while 


concluding that the largely actuarial model does have some place in 
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combining largely numerical information for purposes of decision 
making. 

Sawyer (1966) also sees the problem of clinical versus 
actuarial decision making as merely the last half of the problem. 
He indicates that the collection of data can also be considered as 
a clinical or actuarial problem (e.g., the choice to collect test 
or interview data). Sawyer (1966) concludes that the real strength 
of the clinician is in the providing of additional nonpsychometric 
information to the decision making process and not in decision making 
per se. Sawyer (1966) indirectly discounts much of the research 
reviewed by Meehl (1954) by indicating that the paucity of research 
favoring the clinical method derives from the fact that the research 
design utilized in many studies has forced the clinician to play 
the actuarial game (e.g., forced choice responses for ease of 
tabulation or the exclusion of nonpsychometric information-- 
interview impressions). Holt (1970) reaffirms this view; he says 
that studies have yet to look at clinical prediction at its best 
compared with actuarial prediction at its best. 

Meehl (1954) describes four combinations of data and methods of 
obtaining data as (a) psychometric data combined mechanically, 
(b) psychometric data combined nonmechanically, (c) nonpsychometric 
data combined mechanically, or (d) nonpsychometric data combined 
nonmechanically. More complex combinations of these singular 
combinations are also possible (e.g., psychometric and nonpsychometric 


data combined nonmechanically). However, the bulk of research that 
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Meehl (1954) reviews would fall into categories (a) and (b); 
little evidence is available regarding the more methodologically 
difficult categories or combinations of categories. It seems that 
Meehl (1954) is reviewing studies high in experimental rigor but low 
in ecological validity. 

Holt (1970), in his review of Meehl (1954), Holt (1958), 
Sawyer (1966), and more recent clinical and actuarial findings, 


concludes that: 


(a) When the necessary conditions for setting up a 
pure actuarial system exist, the odds are heavy that 
it can out-perform clinicians judgements in predicting 
almost anything in the long run if both sides have 
access only to quantitative data such as an MMPI 
profile. (b) A complete six-step predictive system 

is almost always better than a more primitive one, and 
even when it seems to be entirely statistical, it 
requires the exercise of a great deal of subjective 
judgement to work efficiently. (c) Disciplined, 
analytical judgement is generally better than global, 
diffuse judgement , but it is not any Jess clinical: 
(d) To predict almost any kind of behavior or behavioral 
outcome, one does better to assess the situation in 
which the behavior occurs in addition to assessing the 
actors' personalities. (e) Granted such knowledge 

and a meaningful criterion to predict, clinical 
psychologists vary considerably in their ability to 

do the job, but the best of them can do very well. 
That is they do have the skills in assessing per- 
sonality by largely subjective, but partly objecti - 
fiable procedures, making use of theories that permit 
a deeper and more valid understanding of persons than 
anything a statistician can provide. (p. 348) 


The real problem of the predictive validity of clinical or 
actuarial judgement may be escaping both clinician and actuary. Ash 
& Kroeker (1975) review the efficacy of both models of decision making. 
They would rate both as low indicating that a criterion-predictor 


match of .60 (high by today's standards for either clinicial or 
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actuarial techniques) is still appallingly low. 
Clinical Judgement: Reliability 

In comparison to both model building and predictive validity 
studies, a much more limited amount of research has focused on the 
problems associated with the reliability of clinical judgements. 
Goldberg & Werts (1966) cite several types of reliability measures 
of interest to clinical judgement researchers: "(a) over time for 
the same judges using the same data (stability), (b) over judges, 
for the same data from the same occasion (concensus), and (c) over 
data sources administered on the same occasion and interpreted by the 
same judge (convergence) (p. 199)"'. Goldberg & Werts (1966) 
indicate that problems in any one of these areas or, as is more 
likely, in combinations of these areas, pose threats to the validity 
of judgement. They see the error covariation across time, sources, 
traits, and targets as major limitations in the study of clinical 
judgement. They indicate that, "no study of the reliability of 
clinical inferences is ever likely to provide definitive conclusions 
(p. 200)". Sawyer (1966), in discussing the overriding concern 
with the validity of clinical judgements, comments that simple com- 
parisons between combinative models do little to explain or improve 
either method. 

The classic study of convergence in clinical judgement was 
done by Little & Schneidman (1959). These researchers were concerned 
with the convergence of clinical judgement over certain aspects 


of a similar data base (psychometric data). Clinicians were required 
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to rate subjects using a Q-sort technique as either psychotic, 
neurotic, psychosomatic or normal on the basis of one of the 
Rorschach, Thematic Apperception Test and Make a Person, MMPI or 

a combination of several interpretive tests. Their findings, while 
disheartening for the clinician, are not altogether unexpected. 
They were unable to find a high degree of convergence across similar 
aspects of the same data base. The problems in generalizing from 
the Little and Schneidman (1959) study are manifold. They are 
dealing with a unidimensional data base (psychometric data), are 
concerned with unidimensional decision making, and are concerned 
with a psychologically "unwell" population. 

Goldberg & Werts (1966) utilize a specialized form of 
multitrait multimethod clinicial judgement research. Clinicial 
psychologists were required to rate psychiatric patients on four 
categories using one of four data sources (MMPI, Rorschach, Wechsler, 
or Vocational History). They were unable to find any relationship 
between the judgements of one clinician working from one information 
source and those of another clinician working from another data 
source. This study cannot be considered a real study of convergence 
in clinical judgement since it is concerned more with agreement 
across raters (concensus) as it is with agreement across sources 
(convergence). This study would probably score low in what Snow 
(1974) would call ecological or external validity. There seems to 
be a real dissimilarity between experimental tasks and "real" 


clinician tasks in real assessment situations. Experimental 
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clinicians were asked to rate subjects in a manner which was 
probably foreign to them and were then chastized for failing to rate 
consistently. Sawyer (1966) would see this as a study in which 

the clinician was made to play the actuarial game. This threat 

to external validity is further magnified by the confounding of 
concensus and convergence as reliability measures. How important 

is it that the ratings of one clinician from one data source agree 
with those of another clinician using a different data source? 

Goldberg (1966), in a study of peace corps selection board 
procedures, evaluated the stability and convergence (inseparably) of 
board members' decisions regarding potential applicants. The 
relationship of board members' individual decisions before and after 
board discussions of the candidates was analyzed. His findings were 
that decisions before and after board discussion were highly 
correlated, oe in the order of .80 but that decisions between 
raters were only moderate, being in the order of .40. The study, 
although interesting, is-difficult to interpret because of the 
confounding of stability and convergence. In terms of its external 
validity, however, it must be applauded. 

Slovie (1966) indirectly addresses himself to the reliability 
of clinical judgement, particularly across diverse and multiple 
information sources. His findings indicate that, in the prediction 
of intelligence, clinicians used only two or three key predictors 
even when they were presented with (and believed they used) many. 


Additional sources of information were used only when conflicting 
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information was evident in the prime two or three factors. A threat 
to reliability then, may be the targeting behavior of clinicians 
in reference to the information they have available. This is further 
confirmed by Perez (1973) in a study involving the discrimination 
between different types of criminal test protocols. His research 
indicates that additional information has little effect on decision 
accuracy (reliability or validity). 

The questions of why and how clinical judgements are unreliable 
(or reliable) remain largely unanswered in the literature. It is 
noteworthy that few researchers or studies to date have systematically 
investigated the problems of reliability, particularly convergence, 
preferring to further reinforce the wealth of information available 
in the areas of predictive validity. 

Clinical Judgement in Executive Appraisal 

Although the relationship of clinical judgement research 
to the field of executive appraisal is a logical one, the area has 
been only sparsely researched. Historically, the emphasis taken 
in the derth of information available that deals with executive 
appraisal and characteristics of successful executives, has focused 
on the predictive validity of unimethod (interview) or multimethod 
(assessment centers) assessment techniques. Thus, we have seen 
very little of model building, as has been the emphasis in clinical 
judgement in the areas of clinical psychology, or on reliability, a 


point in common between the two areas. 
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Ulrich & Trumbo (1965) present an excellent and detailed 
summary of the personnel selection interview to which the reader is 
referred. Their findings indicate that the low predictive validity 
demonstrated by most assessment interviews may be due to contamina- 
tion of data or criterion problems. They see the major use of the 
interview in assessing personal relations and career satisfaction. 
The lack of sufficient controls on interview research is a concern 
echoed by Mayfield (1964) and Mayfield & Carlson (1972). All of 
these researchers agree that the major thrust in interviewing 

research should be internal, i.e., "studying the decision making 
process as it operates in the selection interview" (Mayfield & 
Capison, 1972, Pp. 41)’. 

Other studies on the interview have shown low stability of 
ratings (Vaughn & Reynolds, 1951) and low inter-rater reliability 
(Schwab § Heneman, 1969) on the basis of informal unstructured 
interviews. Vaughn & Reynolds (1951) indicate that inter-rater 
reliability (concensus) increases as a direct function of interview 
structure. Hollman (1972) explains part of the problem regarding 
intra- or inter-rater reliability in interviewing, particularly 
with respect to threats to validity, He indicates that interviewers 
appear unduly swayed by negative client information obtained during 
interview and tend to ignore more relevant positive information 
obtained at the same time. Langdale & Wertz (1973) add that inter- 
rater reliability increases as a function of interviewer knowledge 


of the prospective job, adding that unless the interviewer knows 
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the prospective position thoroughly, inconsistencies are inevitable. 

Other researchers, in discussing threats to the reliability 
of the assessment interview (and indirectly, validity), have focused 
on other areas. Baskett (1973) indicates that a major concern 
should be the similarity of interviewer-interviewee attitudes. When 
these attitudes differ markedly, interviewee ratings suffer. 

Lipsett (1964) argues for the use of interviewing, saying that much 
of what we think we have with personnel tests (validity), never 
really existed. 

The literature on interviewing in executive appraisal, while 
plentiful, does not answer much in relation to clinical judgement. 
We know only that poor or ineffectual decisions are being made. We 
have little indication of why or where. 

The only area of executive appraisal to which a modified 
form of the clinical judgement research may be applicable is the 
assessment center. The assessment center, first commercially used 
by the American Telephone and Telegraph Company to assess managerial 
performance and potential (Bray & Grant, 1966), is an adaptation 
of German psychologists procedures for screening officer candidates 
(Dunnette, 1971; Blumenfeld, 1971). The assessment center combines 
performance appraisal techniques, such as the interview, paper- 
and-pencil tests, in-basket exercises, leaderless groups, and 
simulation exercises to formulate multitrait ratings of candidates. 
Traditional clinical judgement findings are not directly applicable 


here since ratings of several psychologists, managers, or super- 
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visors, although derived independently, are combined for purposes 
of final assessment. Dunnette (1971) describes the relationship of 
assessment center findings to behavioral ratings obtained on the 
job. Correlations ranged from the low ¢.20's to the high 1.70's 
depending on the trait measured (see Appendix 5). 

Bray & Grant (1966) studied an assessment center initiated to 
appraise future managers for the Bell Telephone System. Their 
findings indicated that, although all predictors were used for 
making ratings, considerable inter-rater variability was evident 
in combining the data. In an aspect of the same study, Grant & Bray 
(1969) dealt more specifically with the interview information 
obtained in the assessment center. Their findings indicate that 
structured interviews are able to yield reliable and valid indicators 
of future performance. 

Wollowick & McNamara (1969) in their research which studies the 
use of the assessment center with IBM managers, found that adding 
information received from situational tests increased predictability. 
These researchers also add weight to the actuarial versus clinical 
debate by adding that a statistical combination of the assessment 
center program variables was better than any single subjectively 
derived overall rating. Henrichs (1969), in dealing with the same 
subject pool as Wollowick & McNamara (1969), indicates that a careful 
analysis of employee work records was also highly related to future 
performance. 


Moses (1973), in a more recent study of assessment centers, 
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reinforces this; he notes increased validity of assessment center 
predictions as a function of increasing time between prediction 
and evaluation. 

Albrecht, Glaser, and Marks (1964) use a multiple assessment 
procedure that is really a forerunner of the assessment center 
approach. They were unable to find significant validity in the 
procedure using a multitrait multimethod matrix approach, but their 
research was hampered by methodological shortcomings. Criterion 
behaviors were evaluated by superiors who had little contact with 
candidates or by peers rather than by direct supervisors. 

Bray & Grant (1966) indicate that many of the key character- 
istics measured by the assessment center can be obtained by an 
interview, a finding suggested by Glaser, Schwarz, and Flanagan (1958), 
but one that is at variance with more disheartening research on the 
assessment interview (Webster, 1964). 

Blumenfeld (1971) sees the greatest benefit in assessment 
center methodology as the equal opportunity afforded candidates, 
use of trained assessors, and situational exercises high in what 
Snow (1974) would call ecological validity. Wilson & Tatge (1973) 
are less optimistic; they see the assessment center approach as 
very costly and not necessarily better than more traditional methods 
of assessment. 

Trankell (1959) describes a study which, although it deals 
almost exclusively with predictive validity, is noteworthy in terms 


of the present research. In one of the few studies that used 
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psychologists exclusively as part of an industrial selection 
procedure (air pilots), candidates were rated on a 14 variable matrix 
on the basis of a clinical integration of paper-and-pencil tests. 
In what he describes as a "craftsman's job (p. 174)", Trankell (1959) 
describes how the integration of tests by a competent psychologist 
yields excellent results in terms of decision accuracy. He argues 
for the intelligent use of tests as predictors indicating that, 
rather than arguing relative merits, the strengths of each should 
be combined. 
Summary: Literature Review 

l. The general area of clinical judgement has been well 
researched specifically from the perspectives of predictive validity 
and model building. The area of clinical judgement in executive 
appraisal is only sparsely researched and the nature of that research 
has been primarily predictive validity studies of interviewing and 
assessment centers. 

2. Clinical judgements, although they may be configural in 
nature, are pieamaeeies described by a linear model. 

3. The linear model, whether it be used in a bootstrapping 
or traditional predictive manner, is at least the most accurate 
method of combining mathematically represented information for 
decision making. Even when beta weights are estimated or applied 
randomly, they better or equal a human judge working with the same 


information. 


4, There is little research on the reliability of clinical 
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judgement. This is particularly true of convergence. What reliability 
studies that have been done have been concerned with concensus and/or 
stability. Convergence studies, when they have been attempted, have 
dealt with a similar data base (test or interview) or have been 
confounded with stability and/or concensus. 

5. The majority of the research on clinical judgement, 
particularly that dealing with model building and predictive validity, 
would rate low in ecological (external) validity (Snow, 1974). 

If one views generalizability as a function of representativeness 
(Snow, 1974), the majority of the studies cited have been well off 
target. Typically, clinicians are required to rate subjects on 
variables that are foreign to them, using criteria and rating 
scales totally alien to their usual method, and are then critiqued 
for off-target behavior. 

6. There exists at present no study which investigates the 
convergence of clinical judgement in a natural setting. This is 
particularly true of a natural, applied, vocational setting. 
Reliability is an extremely important, albeit ignored, concept in 
clinical judgement research (Goldberg & Werts, 1966). It should be 
noted that validity is unknown if the problems of reliability have 
not been accounted for. At present, the apple cart appears to 


have usurped the horse! 
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CHAPTER III 
EXPERIMENTAL DESIGN 

Clinician Sample 

The three clinicians involved in this study are all profes- 
sional staff of A. W. Fraser & Associates, a medium-sized, locally- 
owned industrial psychology and management consulting firm. 
Clinician #1, the chief psychologist, holds psychologist registra- 
tion in three Canadian provinces, has over 12 years experience in 
executive appraisal and many more years of clinical experience. 
Clinician #2 has a B. A. (Hon) degree in psychology and over five 
years experience in executive appraisal. He was originally trained 
in executive appraisal techniques by Clinician #1 and was supervised 
very closely for the first three years in what might be described 
as an intensive and very highly supervised clinical-industrial 
internship. Clinician #3 is also a registered psychologist and has 
three years experience in industrial and executive appraisal. His 
most recent two years of experience have been obtained as a staff 
member of A. W. Fraser & Associates. 
Subject Sample 

Subjects utilized consist of recruitment and comprehensive 
appraisal candidates processed by the clinicians of A. W. Fraser & 
Associates from a time beginning with the inauguration of this 
study and ending when each clinician has rated at least twenty 
candidates. This covers the period March 1975-December 1975. 


Recruitment candidates are those candidates who have applied for 
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executive positions through the recruiting division of A. W. Fraser & 


Associates; comprehensive appraisal candidates are subjects sent 
to A. W. Fraser & Associates for assessment by their own companies 
in order to assess future development potential within that company. 
Procedure 

Definition of Traits 

The definition of traits or characteristics of concern to the 
three clinicians of A. W. Fraser & Associates in assessing executive 
talent, were arrived at by a process of concensus by the three 
clinicians involved. Concensus was obtained onthe number and name 
of the characteristics that 'make the difference' in executive 
performance and on the definition of these characteristics 
(Appendix 1). The three rating scales (Appendices 2, 3, & 4) 
used to quantify these characteristics had been in informal use 
in the organization previously but were modified to encompass the 
18 key characteristics arrived at by concensus and the three 
information sources (test, interview, & test-interview). 
Experimental Procedures 

1. After completion of each assessment interview, the 
clinician completed the Interview Rating Form (Appendix 2) for 
the individual interviewed. This completed rating form was imme- 
diately returned to the office secretary for safekeeping and was 
not further available to the clinician. 

2. The subject was administered the following tests as part 


of the appraisal battery: Differential Aptitude Test (Verbal and 
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Abstract); Wonderlic Personnel Test; Watson-Glaser Critical 

Thinking Appraisal; Test of Business Judgement; Test of Practical 
Judgement; Supervisory Practices Test; Management Aptitude Tnventory; 
Holland Vocational Preference Inventory; Edwards Personal Preference 
Schedule; and the California Psychological Inventory (see Appendix 7 
for summary description of tests). These tests comprise the usual 
executive assessment test battery utilized by the staff of A. W. Fraser 
& Associates; infrequently, additional tests are added to this battery. 


3. The clinician was provided with a copy of the profiled 


results from all tests administered. Using the test results and 
interview impressions, the clinician completed the Interview + Test 
Rating Form (Appendix 3) for that candidate. This completed rating 
form was immediately returned to the office secretary for safe- 
keeping and was not further available to the clinician. 

4. Approximately two months after the clinician had com- 
pleted his required number of cases, he was provided with the test 
profiles from every subject he had previously rated. These profiles 
were made available to the clincian singly, in random order, and 
without identifying demographic information. The clinician then 
completed the Test Rating Form (Appendix 4) for each subject 
individually. This rating form was returned to the office secretary 
who collated the three rating forms from each subject. 

Analysis Procedure 
Ratings for each of the 18 characteristics variables (Appendix 1) 


for each of the three rater conditions (test, interview, and test+ 
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interview) were analyzed by a one-way analysis of variance with 
repeated measures (ANOVA). This was done for each clinician 
individually and for all clinicians combined. If an Feraitio 
obtained exceeded chance, individual comparisons between rater 
conditions were undertaken by the Newman-Keuls method of multiple 
comparisons. The reliability of the three ratings of each charac- 
teristic (factor) were also calculated as per a procedure 
outlined by Winer (1971, p. 290) and Ferguson (1971). 

Experimental Hypotheses 

There will be no significant differences between the means 
of the results obtained by any of the three assessment methods for 
any of the 18 characteristics for any of the three clinicians. 

Limitations of the Study 

This study is concerned with the convergence of clinical judge- 
ment across information sources with subject and rated character- 
istics held constant. Limitations then are limitations imposed by 
rieecueoieted perspective. 

1. No information will be available regarding the predictive 
validity of clinical judgement. This is not a study of predictive 
validity in clinical judgement, but rather a study of a specialized 
aspect of the process of clinical judgement. 

2. Subjects were not randomly assigned to clinicians. 
Although no overt bias is present in subject assignment at 
A. W. Fraser & Associates, systematic covert bias in subject 


assignment cannot be excluded from consideration. In actual prac- 


26 


Jggbvt issinbis te sense evaes ald Ati“ hemrone: 2f ybute aig? 9 _ , 


7 


aaa a MI .bentdnae bs 


besiisivega & te ybute o todtet sod .jasmagbuy fenialis r 


Gaede ‘shat heen ein 
aightiva to bodtem alve,’- ~asuwe ext vd nodetisbau sew 
-cereds, dose, to egaktas soxd7 ic to ywtilidsites edt mn 

acwhenorg 8 ¥sq 26 hetsivsiso cele. axa4 (103052) sisabres © 


(EMRE) aocugast bas (ues «c LVL) aeaeh vd bembltvo 

: - 

eqeotizogylty Laisenttegyd we 

SREGA exe aeouted asonetettib Qosotiiogte of at Siiw soul. . ae Lp 
‘toR abodien gnemansee eect off do, vas yd boaletdo atiaeem orto 
sagetotaiio sexis eft In Ye xed Shs veksesarrene BL ait to. yas a 
bur? ost, Fo Botts ied 


-rotepwads badet bie tooiduae dtiw tepuoe nottsarrc tat pie 
ya bozoqut sftoltetight sts deds esoisstimil .Jaesenoo bled 2 
A a, pyltpegaisg betointaet 
ovisotherq oft pabbanest sideiieve sd Lite ‘tottamsetal of taigh ‘ 
svitoibagy 20) vbutte,s ton Si teil .tnenegbut Laciatto Yo pre 


Le a 
7 
7 

- 

7 


pie y 


: trsmegtutt inoinlto to aaesea edt te s 

.enefolalls 61 baugizes ylmobites tom sxew etoefdue .S _ * 

aie SAIS AERA AIL. 2 FS OR | 
|| 9) seatdum at eld sroves, sivenetaye enti: 


aL 


tice, each clinician is assigned to certain specific assignments 
based on his time availability and would see all subjects associated 
with that particular assignment. Snow (1974) would see this as the 
compromise that must occur between ecological validity on one hand 
and rigor of experimental design on the other. 

3. Subject (client) selection was not random. Subjects 
can be considered to be representative of the types of clients who 
undertake executive appraisal. 

4. All clinicians are male and all of the subjects are male. 
This may preclude generalizability of results to female populations. 

5. Clinicians are not of equal training and experience. 
Although this has been seldom realized in a study of clinical 
judgement, there is a possible, but undetermined, effect on the 
generalizability of research findings. It is possible to investi- 
gate differences between clinicians but clinician sample size is 
far too small to investigate the effects of clinicians' charac- 
teristics on judgements of subject characteristics. 

6. The possibility of clinicians' remembering profiles from 
the test + interview condition when they rated profiles. in the test 
only condition is remote. It is, however, a possible weakness of 
design. The two month delay and the volume of work processed in 


that two month period did much to minimize this possibility. 
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CHAPTER: IV 
RESULTS 

In this chapter, data pertaining to each of the clinicians by 
factor by rating condition interactions are presented. Results are 
organized by factor and are presented for each of the three 
clinicians in each of the three assessment conditions. 

Definition of Terms 

Since several terms will be used extensively in summarizing 
data analysis, a description of these terms, as they apply specifi- 
cally to the npesent study, is given below: 

F Ratio. Since the design utilized in this study involves a 
one-way analysis of variance with repeated measures, the ratio: 
F = Mean Square Treatment/Mean Square Residual is appropriate 
(Winners ade72 op.°267) 

Significant. Alpha is equal to .05. 

Reliability (KR) a Thesreliability eoekiiciente(R) hrs a 
simple proportion which represents the proportion of obtained 
variance that is true variance. For example, if R = .80, it means 
that 80% of the variation in the measurements is due to variation 
in the true score (real differences) with the remaining 20% 
variation due to error (Ferguson, 1971). 

Unadjusted Reliability - Single Source (Rl). The reliability 
of one estimate by one clinician of a single factor. 

Unadjusted Reliability - Pooled Source (Rk). The reliability 


of the mean of the pooled or combined estimates of a single factor 
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by one clinician. This is frequently referred to as the Spearman- 
Brown reliability measure (Winer, 1971, p. 286). 

Adjusted Reliability - Single Source (R*1). The reliability 
of one estimate by one clinician of a single factor after removal 
of mean differences between rating conditions as a source of error 
Winershlo7l, pi 290)4 

Adjusted Reliability - Pooled Source (R*k). The reliability 
of the mean of the pooled or combined estimates of a single factor 
by ane clinician after removal of mean differences between rating 
conditions as a source of error (Winer, 1971, p. 290). 

The adjusted reliability coefficients R*l and R*k are concerned 
with pegging or anchoring of the mid-points that a judge or rater 
appears to be using in estimating performance or ability on any 
given factor or trait. For example, if judges grading ten examination 
papers maintain essentially the same rank order so far as their 
grades are concerned, but differ in the actual values they assign, 
the use of an adjusted reliability estimate just described may be 
appropriate. The reliability model which removes mean differences 
is used when both means and variances are an important interpre- 
tation consideration from the perspective of error sources, 

In discussion of reliability in this chapter, the adjusted 
reliability (R*1 and R*k) will be used predominantly, although 
both adjusted and unadjusted reliability estimates are presented 
in table form for reference. For purposes of the discussion of 


convergence, each of the reliability estimates just described 
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(R1, Rk, R*1, and R*k) can be considered as important and will be 
presented within the context of interpretation for each factor 
individually. 

The relationship between Rl and Rk or R*1l and R*k may be 
expressed as: Rk = 3R1 (1 + 2R1) or R*k = 3R*1 (1 + 2R*1). This 
means that as Rl or R*l approach one as an absolute value, Rl 


approaches Rk and R*1l approaches R*k. 
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Factor 1: Intelligence 

Factor 1 has been defined as "the basic ability to learn and 
understand" (Appendix 1). In this study, aspects of this factor 
are sampled by clinical interpretation of psychometric tests such 
as the Wonderlic Personnel Test, Watson-Glaser Critical Thinking 
Appraisal, and the Differential Aptitude Tests (Abstract and Verbal) 
as well as by interview expertise .t 

Table 1 presents the one-way analysis of variance with repeated 
measures (ANOVA) performed between the results obtained from each 
of the threé assessment conditions for each of the clinicians 
individually. As is evident from Table 1, there is a significant 
degree of parallelism between the results obtained in each assess- 
ment category. This is true for all three clinicians. None of the 
F ratios obtained are sufficiently large to warrant further 
between-groups comparisons. 

Tables 2 and 3 summarize the reliabilities, means, standard 
deviations associated with Factor 1. As would be expected on the 
basis of the previously mentioned F test, there is a marked similarity 
in both the means and standard deviations of the scores in each of 
the three assessment conditions for all three clinicians. 

Clinicians differ markedly in the reliability of their 


decisions made with respect to levels of intelligence. Clinicians #1 


on each of the 18 factors, the tests which are indicated as 
being clinically combined for purposes of measuring these factors 


are as indicated by the three clinicians. 
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and #3 obtain a single measure reliability of approximately .50 with 
a pooled source Rk greater than .70. The single source R for 
Clinician #2 is so low as to cause concern for purposes of predic- 
tion. Even when one pools estimates (R*k), a value of only .32 is 
obtained, lower even than the R*l for either of the other two 
clinicians. If this R value is in fact typical for all occasions, 
one should anticipate a low predictive validity of intelligence 
ratings made across information sources for Clinician #2. One 


might expect predictably unpredictable predictions! 
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TABLE 1 
One Way Analysis of Variance with Repeated Measures 


Factor 1: Intelligence 


ee ES SSS SS SSS SSS eS A SS 
I I a oS SE SSS a SSS ss SS SS SSS SSS SSS 


Rater Source of Variation Sums of Squares df E p p* 


ig Between 20.18 19 
Within 12.00 LO 
Treatment kaze 2 Panallfe: ws eG 
Residual 10.77 38 
Total Sy 5, lite} 59 
Mme erin he OCR Ls Ay Ai a ee, oe ee on Be ie ee 
2 Between 9.99 23 
Within S56 Se! 48 
eirec coment 1.78 2 SRO2 06 09 
Residual 13.56 46 
e 2 Vial 
Total 429 
3 Between 29.43 29 
Within 18.67 60 
Treatment eral 2 Pesala ns a6 
Residual T7s40 58 
Total 48.10 89 


p* = Conservative probability of F which makes allowances for unequal 


covariances among correlated measures. 
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TABLE 2 


Unadjusted and Adjusted Reliability Estimates 


Factor 1: Intelligence 


Clinician Source Unadjusted Adjusted 
Reliability Reliability 
Fe eS ee ON eS ee eS eee 
ul Single “46 (RL) -48 (R*]) 
i Pooled -72° (Rk) -73 (R*k) 
2 | Single «11 (RI) oL4 (R*1) 
2 Pooled -26 (Rk) -32 (R*k) 
3 Single -43 (R1) 44 (R*1) 
3 Pooled -69 (Rk) -70 (R*k) 
TABLE 3 


Means and Standard Deviations 


Factor 1: Intelligence 


Clinician Rating Condition Mean Standard Deviation 

4 Interview 4.40 .58 
z Test 4.20 .81 
L Combined 4.55 71h 
2 Interview by aril 54 
2 Test 4,37 Au /o) 
2 Combined 4.71 45 
3 Interview 4.50 .50 
3 Test 4.53 ~76 

4,27 Peso 


3 Combined 
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Factor 2: Common Sense 

Factor 2 is described as "the degree of ability to reach 
quick, practically effective decisions about uncomplicated situa- 
tions where sound judgement depends primarily on accumulated life 
and work experience, established precedent and procedures, etc." 
(Appendix 1). In this study, "common sense" is sampled by the 
clinical interpretation of tests such as Management Aptitude 
Inventory, California Psychological Inventory, and The Test of 
Practical Judgement in addition to interview evaluation. 

Table 4 summarizes the ANOVA pertaining to Factor 2 for each 
of the three clinicians. For Clinicians #1 and #3, the differences 
in the diagnoses made between information sources are not signifi- 
cant. For Clinician #2 the differences in the diagnosis made 
between information sources are significant (F = 44S, a2 .02) 
and individual comparisons between groups are warranted. A 
Newman-Keuls multiple comparison between the three means (Winer, 
1971, p. 217) indicates that the mean of the interview group is 
significantly greater than the mean of the test group and that the 
mean of the combined group is also significantly greater than the 
mean of the test group. There is no significant difference 
between the means of interview and combined groups for Clinician #2. 
It appears that subjects rated by Clinician #2 were rated signifi- 
cantly lower in the test condition than in either of the other 
two assessment conditions. 


From Table 5, we see that these mean differences between 
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groups for Clinician #2, although significant, are not great, being 
in the order of .5. It is noteworthy that the standard deviation 
of the test condition for Clinician #2 is greater than that observed 
in either of the other two assessment conditions. The standard 
deviation of the test condition most closely parallels that of 
the combined assessment condition where one might expect test 
results to exert a moderating influence on the interview impressions. 
Reliability values associated with Factor 2 for the three 
clinicians are moderate with R*l's in the order of .40 and R*k's in 
the order of .68. By more than tripling the amount of time required 
for sea evaluation, variance error is reduced by approxi- 
mately 30%. A subject is appraised slightly differently in "common 
sense"' depending on the assessment condition in which he is viewed. 
Particularly with Clinician #2, a candidate might be downrated 


somewhat if seen only in the test assessment condition. 
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TABLE 4 
One Way Analysis of Variance with Repeated Measures 


Factor 2: Common Sense 


SE SS SS SS sss SS es PSS 
aE OEEE agg... 


Rater Source of Variation Sums of Squares df Iz 2) p* 


1 Between 2.00) 19 
Within 9.33 40 

Treatment 43 2 .93 41 Are) 
Residual 8.90 38 
Total 22.18 59 

ee a Be as ee ee ee Se ee SOD, See ees eee 
2 Between Pra al 23 
Within PEG, Syl - 448 

Treatment 3.09 Z 4.48 P5304 ORS) 
Residual 18.97 46 
Total 47.78 71 
3 Between 47.39 29 
Within 28.00 60 

Theat 2.49 Oy 82783 eO7 20 
Residual 25.51 28 
Total 75.39 89 


p* = Conservative probability of F which makes allowances for unequal 


covariances among correlated measures. 
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TABLE 5 
Unadjusted and Adjusted Reliability Estimates 


Factor 2: Common Sense 


aa a ae ee eee 
Ce eae RNS Saad SS rae sass el nS Sesser as ypais-srasssiseis gst snnsGsoalpsespdlsmssepir sas enes can pRSNeneEpaen=enie=ntinpansbcansoosemeannbenecusessneonneeceanienorareias 


Clinician Source Unadjusted Adjusted 
Reliability Reliability 
i ee ee oe ee ee ee 
z Single -33 (R1) - 39 (R*1) 
ik Pooled -60 (RK) .65 (R*k) 
2 Single -30:(R1) .35 (R*1) 
he Pooled 57 (Rk) .62 (R*k) 
3 Single -45 (RL) .48 (R*1) 
3 Pooled .71 (Rk) | .73 (R*k) 
TABLE 6 


Means and Standard Deviations 


Factor 2: Common Sense 


Clinician Rating Condition Mean Standard Deviation 

if Interview 3.85 65 
mt Test 380 60 
ih Combined +. 00 95 
2 Interview 3.62 596 
2 Tee 8312 88 
2 Combined 3.98 86 
3 Interview 3.83 73 
3 Test cow A 99 

3.43 35 


3 Combined 
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Factor 3: Oral Communication 

Factor 3 is described by the clinicians involved in the study 
as "the degree of clarity and ease with which an individual 
expresses himself in face-to-face discussion" (Appendix 1). In 
this study, aspects of interpersonal effectiveness are sampled by 
interpretation of the California Psychological Inventory: Section I, 
and by interview evaluation. 

As evidenced by Table 7, the F ratios obtained for each of 
the three clinicians were not significant. Variances within groups 
and between groups were essentially the same. From Table 9, we 
see that this similarity is further evidenced by the close 
Similarity of means and variances within each clinician cluster. 

Reliability coefficients R*l and R*k are not high, particularly 
for Clinicians #2 and #3. Although mean differences between rating 
conditions appear to cancel each other out as evidenced by the low 
F Ratios obtained, the effect of differential rankings on the R*1 
and R*k values is considerable. Particularly for Clinicians #2 and #3, 
the reliability of any single estimate of oral communication ability 
(R*1) is so low as to have a great deal more of the prediction 
accountable for by error than is accountable for by true variation. 

It is noteworthy that, although Clinician #3 indicated that 
he could not rate oral communication in the test condition, the 


other two clinicians were able to do so with results comparable to 


No lamaician #3 did not rate Factor #3 in the test condition. He 


indicated that this was not normal procedure for him. 
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their ratings in the other two assessment conditions. However, 

it does not appear that test information regarding oral communication 
exerts much of a moderating influence vis-a-vis the distinctions 
between interview and combined scores for any clinician; they are 


highly parallel. 
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TABLE 7 
One Way Analysis of Variance with Repeated Measures 


Factor 3: Oral Communication 


SR a ere eS erase lps SSSSesessrpasueelbeesonesconisiusssisncipusmenensasn-ciaas 
a ee Sea 


Rater Source of Variation Sums of Squares df b Pp p* 

AS A A 
ak Between 1G.73 19 
Within MAK (OKG, LO 

Treatment ROS 9) 205 95 Oe 
Residual 11.97 38 
Total ASW GL 59 

So ee a ee PO ee, Ls Bt 
2 Between OD ay, 23 
Within PX) 5 hs} 48 

Treatment .98 2 47 .63 .50 
Residual 28.75 46 
Total Slo ale 

Py ae A a A A a i, ce Ah oleh Se 
3 Between 20.60 29 
Within IA S(OV0) 30 

Treatment OW AL .16 .69 -69 
Residual 11.93 29 


p* = Conservative probability of F which makes allowances for unequal 


covariances among correlated measures. 
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TABLE 8 


Unadjusted and Adjusted Reliability Estimates 


Factor 3: Oral Communication 


Clinician Source Unadjusted Adjusted 
Reliability Reliability 
vi Nena Rescind Sean RE nee eee ENS ee es eed dae Ss POE Se Tl 1 ee I EE Te 
7 Single -39 (R1) -39 (R*1) 
1 Pooled .66 (Rk) .64 (R*k) 
2 Single .16 (RL) .15 (R*1) 
2 Pooled .37 (Rk) .35 (R*k) 
3 Single Sz) ~27 (R*1) 
3 Pooled 4h (Rk) -42 (R*k) 
TABLE 9 


Means and Standard Deviations 


Factor 3: Oral Communication 


Clinician Rating Condition Mean Standard Deviation 

ua Interview 8305 sah 
Als Test 3.60 -66 
ak Combined 3.55 270 
2 Interview 3.46 .86 
2 Test 3.62 .90 
2 Combined 3.67 width 
3) Interview 4.33 91 
3 Test Tae co 

Le 5a 


3 Combined 
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Factor 4: Self-Starting Work Drive 


Factor 4 is defined as "the degree to which an individual 
characteristically keeps himself continuously occupied in work 
related activities without need of stimulation from his supervisor" 
(Appendix 1). In this study, aspects of this factor are sampled 
by an interpretation of the Management Aptitude Inventory, 
Vocational Preference Inventory and California Psychological 
Inventory subscales, as well as by Interview evaluations. 

Table 10 summarizes the ANOVA pertaining to Factor 4 for 
each of the three clinicians. As is evident, significant Foratios 
were obtained for Clinicians #1 and #3. In both cases, a Newman- 
Keuls multiple comparison between the respective mean differences, 
indicates that the mean of the interview group is significantly 
higher than the mean of the test group. For Clinicians #1 and #3, 
it seems that candidates impress as having more self-starting 
work drive when assessed by interview than when assessed by tests. 
There is also more variance in rating this factor in the test 
condition indicating that interview ratings are much more tightly 
clustered around the mean values (little inter-individual variation). 
R values are acceptably high with R*l1 accounting for approximately 
50% of the overall variance in all cases. R*k, which combines 
estimates from all rating conditions, improves on R*l by approxi- 
mately 20%. In practice, Factor 4 could probably be rated by any 


single method with acceptable results. 
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TABLE 10 


One Way Analysis of Variance with Repeated Measures 


Factor 4: Self-Starting Work Drive 
Re a ease Sse Sse cssninssenepsunaes mess opacpenscosi oopeeememaonioiancoesnneeinaseeoees 


aceon egg ee Ee ae ee 
Rater Source of Variation Sums of Squares df F p p* 


al Between 37.52 19 
Within 23500 40 

Treatment 5.20 2 5.45 .008 03 
Residual 18.13 38 
Total 60.85 59 

ee SOLAN eg 3 eS Oo aa ne 
2 Between 43.99 23 
Within 21230 48 

Treatment Zao 2 2.64 .08 cee 
Residual 19.14 46 

pane = lotalintebulew Bo. Oe iyi ARE ees no. 
3 Between 60.49 29 
Within 46.00 60 

Treatment gi Ou os sO, 001 009 
Residual 36.38 98 
Total 106.49 89 


p* = Conservative probability of F which makes allowances for unequal 


covariances among correlated measures. 
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TABLE 11 


Unadjusted and Adjusted Reliability Estimates 


Factor 4: Self-Starting Work Drive 


Clinician Source Unadjusted Adjusted 

Reliability Reliability 
0 Single -44 (R1) -51 (R*1) 
1 Pooled .70 (Rk) -76 (R*k) 
2 | Single -52 (R1) -55 (R*1) 
om Pooled 77 (Rk) -78 (R*k) 
3 Single -36 (R1) — s SeCRRL) 
3 Pooled | -63 (Rk) -70 (R*k) 

TABLE 12 


Means and Standard Deviations 


Factor 4: Self-Starting Work Drive 


Clinician Rating Condition Mean Standard Deviation 
1 Interview 3595 267 
1 Test 3.25 1:18 
1 Combined 3.45 97 
2 Interview 2.92 64 
2 Test 3.21 1.08 
2 Combined 3.33 1.03 
° W 
3 Interview 3.90 0 
3 Test 3.10 1.04 
3 4 PAI} 
2 Combined 3.47 1 
De ee ee 
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Factor 5: Interpersonal Effectiveness 

Pactor 5 is defined as "the level of effectiveness the 
individual demonstrates in day-to-day dealings with others with 
regard to gaining and maintaining their respect for his ideas and 
opinions, their confidence in his integrity, and their general 
feeling of good will" (Appendix 1). Aspects of this factor are 
appraised by the California Psychological Inventory, Vocational 
Preference Inventory, Edwards Personal Preference Schedule, 
Management Aptitude Inventory as well as by interview evaluations. 

From Table 13, we see that significant F ratios were 
obtained only for Clinician #2. A Newman-Keuls comparison between 
mean differences indicates that the mean of the interview rating 
condition is significantly higher than both of the other two means. 
Subjects are rated significantly higher in interpersonal effective- 
ness during interview than when they are rated in either the test 
or combined condition. It seems likely that test information 
exerts a moderating influence on the interview evaluations when 
the combined rating is made. Combined ratings more closely parallel 
those of the test condition with respect to the pegging of mean 
values. 

Although the results for Clinician #3 do not indicate a 
Significant F, R values are very low. This indicates that, although 
deviations made over the total group within conditions appear to 
cancel one another out, ratings of individuals between conditions 


vary greatly. Even the R*k value of .36 is only at a level equal 
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to the R*1 value for the other two clinicians. More than three 
times the effort for Clinician #3 is required to match the relia- 
bility estimate for a single occasion for each of the other two 
clinicians. One should anticipate inconsistent predictions on inter- 


personal effectiveness for Clinician #8. 


TABLE 13 
One Way Analysis of Variance with Repeated Measures 


Factor 5: Interpersonal Effectiveness 


a cs se a 
nT SS ee ee Ee SP ST 


Rater Source of Variation Sums of Squares df ie Pp pr 


1 Between L6uG7 19 
Within L467 LO 

Treatment 63 2 . 86 243 .37 
Residual 14.04 38 
Total 1.33 59 

epee ane sOCdL eee! 5 ery ee Ee oe aa 
2 Between 22% 23 
Within 19.33 4.8 

Treatment 4.08 o 6.16 .004 202 
Residual 15.25 46 
Total 41.88 71 
re Between PALE MS) 29 
Within Doig ANS! 60 

Treatment Dd, WWE 2 DEMO) ola Ly 
Residual 27.18 58 
Total 90.49 89 


p* = Conservative probability of F which makes allowances for unequal 


covariances among correlated measures. 
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TABLE 14 


Unadjusted and Adjusted Reliability Estimates 


Factor 5: Interpersonal Effectiveness 


ee ee Ee ee ee eee Se 
a ee ee ee eee eae ae eae 


Clinician Source Unadjusted Adjusted 
Reliability Reliability 
Sr ae ee a tee ee ee ee ee 
i Single 722 (RS .31 (R*1) 
Bh Pooled OSCR) .58 (R¥k) 
2 Single roa (RI) .39 (R*1) 
2 Pooled 59 (Rk) 66 (R*k) 
: Single -14 (R1) -16 (R*1) 
3 Pooled -33 (Rk) .36 (R*k) 
TABLE 15 


Means and Standard Deviations 


Factor 5: Interpersonal Effectiveness 


Clinician Rating Condition Mean Standard Deviation 

1 Interview 3.35 79 
l Test 3.45 .80 
al Combined 3.20 1 
2 Interview 3.54 76 
5 Test Se (O0) 76 
2 Combined age 64 
3 Interview eos i 
3 Test BiG 36 74 

Sia 67 


3 Combined 
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Factor 6: Leadership Force 

Factor 6 is described as "the amount of influence and dominance 
the individual habitually exerts over groups and persons he 
encounters" (Appendix 1). Aspects of this factor are appraised by 
the California Psychological Inventory, Management Aptitude 
Inventory and by interview evaluations. 

It is encouraging to view the results from the appraisal of 
leadership force under each of the three different rating condi- 
tions. Not only are the F ratios small, but reliability measures, 
in both the individual and pooled cases, are encouragingly high. 
Leadership force appears to be rated symetrically both between 
and within rating conditions. Further, there do not appear to be 
any inter-rater differences with respect to the ratings of leader- 
ship force. Means, standard deviations (Table 18), and relia- 


bilities (Table 17) are highly convergent for all three clinicians. 
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TABLE 16 
One Way Analysis of Variance with Repeated Measures 


Factor 6: Leadership Force 


[aan gE as aaa es ota ae eee ee ee ee 
a a ee 


Rater Source of Variation Sums of Squares df F Pp p* 


u Between 34.40 19 
Within 15-733 40 
Treatment 63 2 =e2 24k 38 
Residual 14.70 38 
Total End 59 
(oie RES ee ee ee ee a ae PY ee Sereno © Seer Se Re 
2 Between 48.44 23 
Within 23239 48 
Treatment sips ks 2 1.24 . 30 a20 
Residual 22014 46 
3 Between 71.96 29 | 
Within 36.67 60 
Treatment 1.16 2 94 40 34 
Residual 35.51 98 
Total 108.62 89 


p* = Conservative probability of F which makes allowances for unequal 


covariances among correlated measures. 
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TABLE 17 
Unadjusted and Adjusted Reliability Estimates 


Factor 6: Leadership Force 


Clinician Source Unadjusted Adjusted 

Reliability Reliability 
1 Single 55 (RI) .55 (Rel) 
1 Pooled .79 (Rk) -79 (R*k) 
2 Single -93 (R1) -53 (R*1) 
is Pooled -77 (Rk) -77 (R*k) 
3 Single -50 (R1) -90 (R*1) 
3 Pooled -75 (Rk) -75 (R¥k) 

TABLE 18 


Means and Standard Deviations 


Factor 6: Leadership Force 


Clinician Rating Condition Mean Standard Deviation 

a Interview 3.15 a7 9 
a Test 3.40 1.02 
1 Combined 2425 .89 
ie Interview 2.83 .80 
2 Test 2.87 1701 
3 Interview 3.20 1,01 
3 Test 3.47 a2 

3.40 1.14 


3 Combined 
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Factor 7: Self-Reliance 
Self-reliance is "the degree to which the individual carries 
out assigned responsibilities without seeking direction, help, 


encouragement and/or reassurance from co-workers" (Appendix 1). 


In this study, elements of this factor are assessed by interpretation 


of the Edwards Personal Preference Schedule, California Psycholo- 
gical Inventory, Management Aptitude Inventory, and by interview 
evaluation. 

Table 19 summarizes the ANOVA done with respect to Factor 7. 
As noted, significant differences between means were observed only 
for Clinician #2. A Newman-Keuls multiple comparison of mean 
differences reveals that the mean of the scores obtained from the 
interview condition is greater than the mean of the scores obtained 
in the test condition. Subjects were typically rated higher in 
self-reliance in the interview condition. Once again, for 
Clinician #2, test results appear to moderate interview impressions 
since the mean of the test condition is not significantly different 
from the mean of the interview condition. 

Reliability measures for clinicians vary considerably for 
Factor 7. Both Clinicians #2 and #3 obtain R*k values which are 
less than the R*1 value obtained by Clinician #1. With R*1l equal 


to approximately .25 for Clinicians #2 and #3, one might expect a 


considerable difference in prediction dependent on rating condition. 
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TABLE 19 


One Way Analysis of Variance with Repeated Measures 


Factor 7: Self-Reliance 


ee 
= a rer SES FT i 5 a i PET LSS 2c SE 


Rater Source of Variation Sums of Squares df E p p* 


1 Between chai 19 
Within LBs: 40 
Treatment iS) 2 74 -48 40 
Residual Lhd 38 
Total 54.40 59 
eee ee Oe oo oer ae oe, 2 2 Se Se eS Se ee ee 
2 Between 2220s: 23 
Within 28.00 48 
Treatment 4.19 2 4.05 02 .05 
Residual 23.81 46 
Total 50.61 ik 
Seer oe eee. 3 Ola ie terelee — Se eee: te AL ee) Pe ae ee 
3 Between 45.15 p48) 
Within TOA 60 
Treatment 4.69 2 3.00 06 .09 
Residual 49.31 98 
Total OS 26 89 


reer 


p* = Conservative probability of F which makes allowances for unequal 


covariances among correlated measures. 
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TABLE 20 


Unadjusted and Adjusted Reliability Estimates 


Factor 7: Self-Reliance 


a eS A ee ee 
a a asap arama a a ee 


Clinician Source Unadjusted Adjusted 
Reliability Reliability 
ia ce eens Gens ne, eels eens, jee) RO ene See 
u Single -90 (RI) SoOuGR eds) 
i Pooled «72 (RK) .75 (R*k) 
2, Single -19 (R1) .23 (R*1) 
2 Pooled -41 (Rk) -47 (R¥k) 
3 Single -22 (R1) -25 (R*1) 
3 Pooled -46 (Rk) -50 (R*k) 
TABLE 21 


Means and Standard Deviations 


Factor 7: Self-Reliance 


Clinician Rating Condition Mean Standard Deviation 

A Interview STO 46 
i Test 3.65 ae 1a. 
au Combined 3.45 Lee 
2 Interview 3.46 81 
2 Test 287 66 
2 Combined 3.08 91 
3 Interview 3.50 81 
3 fesse S800 1.18 

3.03 98 


3 Combined 
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Factor 8: Adaptability 

Adaptability is defined as "the level of ability to cope 
comfortably with new and changing circumstances" (Appendix 1). In 
this study, aspects of this factor are appraised by tests such as 
the Edwards Personal Preference Schedule, the California Psycho- 
logical Inventory, Vocational Preference Inventory, as well as by 
interview evaluations. 

Table 22 summarizes the ANOVA relevant to Factor 8. As 
noted, no significant differences are evident, save for Clinician #3. 
A Newman-Keuls multiple comparison between mean differences for 
Clinician #3 indicates that the mean of the interview condition is 
Significantly higher than the mean of the scores in either of the 
remaining two categories. In the same manner as was evident for 
Clinician #2 on Factors 7 and 5 and for Clinician #3 on Factor 4, 
the test protocols appear to exert a moderating influence on inter- 
view evaluations when a combined rating is undertaken. 

The significant mean difference evidenced by Clinician #3 
is combined with a low reliability (R*l = .23) indicating the very 
real possibility of differential diagnosis depending on the rating 
condition. For Clinician #1, although mean differences do not 
appear to be a large error source, considerable differences in 
ranking are apparent as reflected in the low value of Relea. c0 
which is independent of the similarity or difference of mean pegging 
between groups. Clinician #2 obtained a R*1 value which is 


considerably higher than even the R*k value for the other two 
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clinicians. His single estimate of adaptability is encouragingly 


high and little is gained by combining all three methods. 
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TABLE 22 


One Way Analysis of Variance with Repeated Measures 


Factor 8: Adaptability 


a: aa RU eI: Ste tase esse oem 
SS 


Rater Source of Variation Sums of Squares df a p p* 


de Between Lise 19 
Within Loser 40 
Treatment heya) D2 ioe BUA} ae 4) 
Residual 15:43 38 
Total 34..18 59 


2 Between La awl 23 
Within 18.00 48 
Treatment i386 2 PRANENE BOS my. 
Residual 16). 1h 46 
Total 65). ieh 7 
3 Between 41.29 IRS) 
Within SM rates) 60 
Treatment 13279 2 Ss) -0003 .005 
Residual 43.58 98 
Total 98562 89 


p* = Conservative probability of F which makes allowances for unequal 


covariances among correlated measures. 
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TABLE 23 
Unadjusted and Adjusted Reliability Estimates 


Factor 8: Adaptability 


ee ee eee 
a Ee le SR Se ELS SS a es PEE ae 


Clinician Source Unadjusted Adjusted 
Reliability Reliability 
ee ee ee ee ee ee ee 
1 Single «29 (R1) . 30 (R*1) 
zh Pooled -95 (Rk) .56 (R*k) 
2 Single -60(R1) .62 (R*1) 
ae Pooled - 82 (Rk) . 83 (R*k) 
3 Single earns) -23(R*1) 
3 Pooled -33 (Rk) 47 (R¥k) 

TABLE 24 


Means and Standard Deviations 


Factor 8: Adaptability 


Clinieian Rating Condition Mean Standard Deviation 
i Interview 3.30 64 
il Test 3.10 .70 
L Combined #95 . 86 
2 Interview 3134 .85 
2 Test 3.04 1.06 
2 Combined 2.96 89 
3 Interview 3.90 p07 
3 Test 3.00 - 86 


3 . Combined oa eae 
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Factor 9: Potential for Growth 

Potential for growth is defined as "the degree of probability 
that an individual will develop the personal resources to cope with 
increasingly more complex and responsible work roles" (Appendix 1). 
In this study, potential for growth is appraised by a clinical 
integration of all information obtained by testing plus interview 
evaluations. 

On evaluating the observations in Table 25 which summarizes 
the ANOVA for Factor 9, we see that a significant difference 
exists between the means of the three assessment conditions for 
Clinician #3. A Newman-Keuls multiple comparison of mean differences 
indicates that, as was the case for Clinician #3 on Factor 4, the 
mean of the interview assessment condition is significantly higher 
than the mean of the test assessment group. Once again, we see 
the moderating effect of test information on interview evaluations 
when rating in the combined condition. In the cases of Clinicians 
#1 and #2, a high degree of similarity is evident across rating 
conditions; no significant differences are evident. 

Coupled with the significant differences in mean rating 
demonstrated by Clinician #3, we see a low R*1 associated with the 
estimation of Factor 9. Once again, the use of all three methods 
in obtaining an R*k = .65 for Clinician #3 only approximates the 
single source estimates obtained by Clinicians #1 and #2. 
Reliability estimates for Clinicians #1 and #2 are much more inde- 


pendent of assessment condition. 
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TABLE 25 


One Way Analysis of Variance with Repeated Measures 


Factor 9: Potential for Growth 


i ee 
a ee ee ees 


Rater Source of Variation Sums of Squares df 1s Pp p* 


1s Between Soars 19 
Within IZ OKG; LO 

Treatment 63 ) 1.06 .36 2 
Residual 11.37 38 
Total bi. 73 59 

Bene ee Pee OCA ee eee ee ef ee ee ee ae eee, ee 
2 Between 48.65 a 
Within 205,00 48 

Treatment LAOS 2 IPAS SEXO) .28 
Residual 18.97 46 
Total 68.65 Wa 
a Between 38.99 29 
Within 31.33 60 

Treatment 3.90 2 ela 402 205 
Residual 27.44 98 
Total Wa SZ 89 


p* = Conservative probability of F which makes allowances for unequal 


covariances among correlated measures. 
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TABLE 26 
Unadjusted and Adjusted Reliability Estimates 


Factor 9: Potential for Growth 


SSA Sea aaa aan eircom eee ee ee Ee ee See 
aaa acai atcineaccrtmmetenomececptndeeessonee omer EE EE EEE EE ESE 


Clinician Source Unadjusted Adjusted 
Reliability Reliability 
fe Se ee ee ee No 52 ED BN OE ee MET SS 
vik Single OL. CRT) 61 (R*1) 
ul Pooled oo (RK) .83 (R*¥k) 
2 Single .58 (RL) 58 (R*1) 
2 Pooled . 80 (Rk) -80 (R*k) 
3 Single -34 (R1) 38 (R*1) 
3 Pooled .61 (Rk) -65 (R*k) 
TABLE 27 


.Means and Standard Deviations 


Factor 9: Potential for Growth 


Clinician Rating Condition Mean Standard Deviation 

us Interview 2.90 n62 
A Test 2.75 Behe, 
ue Combined 2.60 91 
2 Interview 3.08 86 
2 Test 2.92 95 
2 Combined 3.21 1.08 
3 Interview 3,97 ag 
3 Test eee t 89 

3.40 92 


3 Combined 
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Factor 10: Readiness to Learn 

Readiness to learn is defined as "the individual's willingness 
to acquire new information, explore new ideas, methods, tasks, etc." 
(Appendix 1). In this study, it is appraised by tests such as the 
California Psychological Inventory, Vocational Preference Inventory, 
Wonderlic, and the Differential Aptitude Tests as well as by 
interview evaluations. 

From an examination of Table 28, it appears that all clinicians 
experienced more difficulty in the rating of Factor 10 than they 
did with many of the other factors. Significant differences 
between rating conditions were evident for all three clinicians. 
A Newman-Keuls multiple comparison paeneee means for each of the 
clinicians reveals considerable similarity in the differences 
exhibited. For Clinicians #1 and #2, the mean of the interview 
condition is significantly higher than the mean of the test con- 
dition. For Clinician #3, the mean of the interview condition is 
significantly greater than the mean of the test condition and 
the mean of the combined condition. Table 30 indicates that for 
all three clinicians, test results appear to be oderedng inter- 
view impressions in the combined rating condition. For Clinician #3, 
this moderating effect is not great, resulting in the additional 
significant difference between interview and combined mean ratings. 

Although significant F ratios were obtained for all clinicians, 
reliability estimates are not so uniform. Clinicians #1 and #2 


parallel each other obtaining an R*1 value of approximately .47. 
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Clinician #3, as has been the case on Factors 8, 7, 5, and 3, 
obtains an R*l value approximately one-half that of his counterparts. 


His degree of convergence between ratings is low. 
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TABLE 28 
One Way Analysis of Variance with Repeated Measures 


Factor 10: Readiness’ to Learn 


See 
ne oe ee eh ee 


Rater Source of Variation Sums of Squares df F p p* 


L Between 24.18 19 
Within 16267 40 
Treatment 2.30 Z 3.84 -03 .06 
Residual 13.87 38 
Total 40.85 59 | 
ee ee ep Nek om ee Oe 
2 Between 41.99 23 
Within 26.00 48 
Treatment LARC) 2 4.85 =O .03 
Residual palahy| 46 
3 Between . Shilo 29 
Within Seeks 60 
Treatiene 11.35 2 6.86 002 01 
Residual nae 38 
Tote) 110.45 89 


p* = Conservative probability of F which makes allowances for unequal 


covariances among correlated measures. 


TABLE 29 
Unadjusted and Adjusted Reliability Estimates 


Factor 10: Readiness to Learn 


a 
a a a a NS a 


Clinician Source Unadjusted Adjusted 
Reliability Reliability 
a ee ea ee eee 2s ee 
1 Single -41 (R1) 45 (R*1) 
iY Pooled MOT -URK) Vi Re) 
2 Single .44 (R1) -49 (R*1) 
2 Pooled 10 (Rk) -74 (R*k) 
3 Single CoICRL) 227 (R*1) 
3 Pooled 44 (Rk) .53 (R*k) 
TABLE 30 


Means and Standard Deviations 


Factor 10: Readiness to Learn 


Clinician Rating Condition Mean Standard Deviation 
a ~ Interview 3.35 .65 
i, Test 2.85 .85 
1 Combined 2.95 . 86 
9 Interview 3.83 1.07 
2 Test 3.25 92 
2 Combined 3.37 81 
5\ Interview 3.97 99 
3 Test 2.93 1.09 
3 Combined 2.73 1.06 
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Factor 11: Management Level Planning and Problem Solving 
Factor 11 is described as "the individual's ability to recog- 
nize the full depth and breadth of situations and problems and to 
consider the longer range, as well as the here-and-now consequences 
of their change or resolution" (Appendix 1). In this study, 
Factor 11 is appraised by the Watson-Glaser Critical Thinking 
Appraisal, Differential Aptitude Tests: Verbal and Abstract, 
California Psychological Inventory, Edwards Personal Preference 
Inventory, as well as by interview evaluation. 
In comparison to the results for Factor 10 just presented, 
the results for Factor 11 are encouraging. From Table 31, we see 
that no mean differences, for any of the three clinicians, are 
significantly different from each other. There is a high degree 
of convergence within each clinician by rating condition cluster. 
Reliability estimates for Factor ll are also very respectable 
with values of R*l approximating .55 for all clinicians. Apparently, 
both in terms of mean variation and intra-rating condition 
convergence, Factor 11 is regarded similarly by all three clinicians 


for all three ratings. 


bam eps eee oift a8 ifaw as me eels 
_yubute eit at «(1 <fbasqg’): ino 
gattatdt fsoisi1o woasld-npe dBW sit yd beer 
-tosw¥edA bas fedxeV vetesl sbusitaa LabsaorwAAtd «. 
epgexstetT [sqoeied sbiswhd , viotaeval caotgetedowes 
noitsulsevs weiverstat yd BRL ISON eB et os or 
»betesestq tebt Of vesonl sot etiveex sid of aoaiasgnes oe, ’ 
see ow, LE sideT movl .gaigetuodie sis ff totost sot etiuaes. 6 
StB ,anGlotaiis seit of? to ae sot Pa cist oe: ae 
ee%gsh tgid s ei saedT «sito dose mort eeaeene o Cineot®. a ie 
»tetauio Holtibnes acitss yd asfoinils dose rintiw eerie a 
ekdetoegees yisy oelp oxs If sotosT cot sstemttee" WIE. 
oi . 


OD alesis ds ,ensisinifo cis 162 4 anitsnixorqgs — eulsy 
( 
flettibaes goiter-svtnl bas soitsiasy asam to amet ae 


gasisiaiio eerit fis yd viasiimte bebasgex et Cf aotest so 


oS 
on aoe. © 


TABLE 31 


One Way Analysis of Variance with Repeated Measures 


Factor 11: Management Problem Solving 


NS aaa saree ss a a ee 
a Se eee eee ee eee 


Rater Source of Variation Sums of Squares df F 9) p* 


ik Between 58.27 19 
Within Br Or/ 40 

Treatment 52S Z 5 le .79 BOS 
Residual 18.43 38 
Total 76.93 59 

Peer ter eee LOLA), ie PR en YS a a ee ee ee 
2 Between $5.83 23 
Within 34.67 48 

Treatment HONS) 2 .39 .68 oy 
Residual 34.08 46 
Total 120.00 71 

Pee eer Pe OTe perweees oo = SS ee SO ER fs oh nee 
3 Between 92.90 29 
Within 42.00 60 

Treatment 220 Z 14 687. a7 
Residual 41.80 98 


p* = Conservative probability of F which makes allowances for unequal 


covariances among correlated measures. 


68 


se eos a 


TABLE 32 


Unadjusted and Adjusted Reliability Estimates 


Factor 11: Management Problem Solving 


Clinician Source Unadjusted Adjusted 
Reliability Reliability 
Date era SER A aR erie Se aera A iret Aan ren 
ih Single roont RI) .64 (R*1) 
th Pooled .85 (Rk) .84 (R*k) 
2 Single £58 (RI) .57 (R*1) 
2 Booted . 81 (Rk) .80 (R*k) 
3 | Single Pol (RL) soa (R=) 
3 Pooled .78 (Rk) .77 (R*k) 
TABLE 33 | 


Means and Standard Deviations 


Factor 11: Management Problem Solving 


Clinician Rating Condition Mean Standard Deviation 
a Interview mis . 80 
L Test 2ice 1.24 
a Combined 2.40 Isi26 
2 Interview 2.879 . 88 
2 Test 3.04 1.40 
2 Combined 3308 Hie souls) 
3 Interview Pa | RoW 
3 Test 2.67 heh) 


3 Combined 2.67 138 
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Factor 12: General Energy Level 

General energy level is "the level of physical vigor and 
vitality the individual will demonstrate in his everyday conduct" 
(Appendix 1). This factor is sampled by the Edwards Personal 
Preference Schedule, California Psychological Inventory, Management 
Aptitude Inventory as well as by interview evaluations. 

Except for Clinician #3, clinicians do not differ in their 
mean ratings between rating conditions for Factor 12. The significant 
difference between mean scores for Clinician #3, wien is summarized 
in Table 34, is again indicative of a difference in mean pegging 
across rating conditions. A Newman-Keuls multiple comparison 
between the results of Clinician #3 indicates that, as was the case 
on many other factors, the mean of the interview rating condition 
is significantly higher than the mean of the test condition. 
Apparently, candidates are rated "more generously" in the interview 
condition than they are in the test condition. 

Achoued Clinician #3 differs from his two counterparts in 
mean differences between rating conditions, he differs very little 
in obtained reliability estimates on Factor 12. All clinicians 
obtain R*l values of approximately .28 indicating considerable 
ranking differences between rating conditions. With such a low R*1l 


value, differential diagnosis is a considerable possibility. 
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TABLE 34 
One Way Analysis of Variance with Repeated Measures 


Factor 12: General Energy Level 


a aa ae Eee 
SS aaa eases ESE EE Ee eae 


Rater Source of Variation Sums of Squares df 1 9) had 


i Between ABO) 19 
Within | 10.266 40 

Treatment 70 2 1.33 2, =20 
Residual 9.97 38 
Total 20.85 99 

ee OL. he es eer i ae ee ee, EE ey ee eee 
2 Between 23.99 23 
Within : 21433 48 

Treatment Aol ae airelte™ BENS 45 
Residual 207 80 46 

ae a Totalinterwiews 2 57 PRR os Re WWE See 
3 Between 23.0080 29 
Within 24.67 60 

Treatment 2100 2 3.42 O04 SO 
Residual 2270) 58 
Total 48.50 89 


p* = Conservative probability of F which makes allowances for unequal 


covariances among correlated measures. 
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TABLE 35 
Unadjusted and Adjusted Reliability Estimates 


Factor 12: General Energy Level 


Clinician Source Unadjusted Adjusted 

Reliability Reliability 
ik Single -25 (R1) -26 (R*1) 
1 Pooled iO CR) ood (Rak) 
2 Single Solo Rd) ~30 (Re) 
2 Pooled oo7 GRC) .57 (R*¥k) 
3 Single -25 (RI) -28 (R*1) 
3 Pooled .50 (Rk) -o4 (R*k) 

TABLE 36 


Means and Standard Deviations 


Factor 12: General Energy Level 


a 


Clinician Rating Condition Mean Standard Deviation 
uy Interview 3565 48 
a Test 3.40 5 One) 
ay Combined 3.60 .66 
0 Interview 3.58 76 
> eee 3250 64 
2 Combined 3.71 93 
3 Interview 4.07 -68 
3 Test Bolo y/ 5 KO) 


a Combined 3°77 x18 
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Factor 133 Bificiency of Application 

Efficiency of application is defined as "the economic and 
productive organization and application of work time and effort" 
(Appendix 1). It is sampled by the Management Aptitude Inventory, 
California Psychological Inventory, Vocational Preference Inventory, 
Test of Practical Judgement as well as by interview evaluations. 

Table 37 summarizes the ANOVA associated with Factor 13 for 
all three clinicians. As is evident from the table, there are no 
significant ‘MES scarcer within each clinician by rating condition 
cluster. Table 38 summarizes the intra-rater reliability estimates 
for Factor 13. For Clinicians #1 and #3, it appears that the absence 
of significant mean differences between rating conditions is 
complimented by substantial R*l values approximating .50. 
Clinician #2, however, does not match this level of convergence 
obtaining an R*1l value of only .16. This value would make the 
reliability of any individual decision, based on any one rating 
condition, tenuous. As should be expected, R*k values are in 
‘close correspondence with those obtained for R*l. However, even 
the R*k value of .36 for Clinician #2 is of concern for prediction 


purposes. 
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TABLE 37 


One Way Analysis of Variance with Repeated Measures 


Factor 13: Efficiency of Application 


cre eg ee ee ee ee 
ee ee ee 


Rater Source of Variation Sums of Squares df F Pp p* 


uy Between 24.40 19 
Within 14.00 40 
Treatment ~40 2 .30 08 ~46 
Residual 13.60 38 
Peewee ne Stal... ee Wes a ee Ee Se ad ae ee 
2 Between v4 6 8 23 
Within 26.67 48 
Treatment 86 2 AY iA 47 .39 
Residual 25.80 46 
3 Between 66.99 23 
Within 28.67 60 
Treatment 69 2 ~71 ~49 ~40 
Residual oe 98 
Total 95.65 89 


p* = Conservative probability of F which makes allowances for unequal 


covariances among correlated measures. 
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TABLE 38 


Unadjusted and Adjusted Reliability Estimates 


Factor. 133 Efficiency ot Application 


er sss... = 
A 


Clinician Source Unadjusted Adjusted 

Reliability Reliability 
1 Single mR.) -46 (R*1) 
i Pooled -73 (Rk) -72 (R*k) 
2 Single -16 (R1) -16 (R*1) 
2 Pooled 37 (Rk) -36 (R*¥k) 
3 Single -56 (R1) -56 (R*1) 
3 Pooled -793 (Rk) -79 (R*k) 

TABLE 39 


Means and Standard Deviations 


Factor 13: Efficiency of Application 


NS SE RS SS RR A RA RR A RS A 


Clinician Rating Condition Mean Standard Deviation 
te Interview 3.70 64 
1 Test 3.90 -97 
a Combined 3.60 paris 
2 Interview 3.17 80 
2 Test 2.92 86 
2 Combined 2.96 73 
3 Interview 3.20 79 
3 Test 3.00 1.18 
3 Combined 3.17 1.10 
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Factor 14: Self-Confidence 

Self-confidence is described as "the degree of basic security 
the individual feels in his own ability to deal adequately with most 
situations and people he encounters" (Appendix 1). This factor is 
sampled by the California Psychological Inventory, Edwards Personal 
Preference Schedule, Management Aptitude Inventory and by interview 
evaluations. 

As summarized in Table 40, significant differences between 
rating condition means are evident for Clinicians #2 and #3. A 
Newman-Keuls multiple comparison between mean differences reveals 
that, for both clinicians, the mean of the interview rating condition 
is significantly higher than the mean of the interview rating 
condition. For both of these clinicians, test results moderate 
interview evaluations to yield a combined rating lower than the 
interview rating but higher than the test rating. This is not 
the case for Clinician #1 where, if anything might be said about 
the statistically insignificant differences, it should be that 
interview evaluations moderate higher test ratings. 

Reliability estimates for all the three clinicians range 
from barely acceptable (R*l = .34) to quite credible (R*1 = .55). 
Roughly 20% of the error variance vis-a-vis reliability is controlled 
by averaging the results from all three procedures rated indepen- 


dently (R*k). 
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TABLE 40 
One Way Analysis of Variance with Repeated Measures 


Factor 14: Self-Confidence 


enn II ee ee 
a eee 


Rater Source of Variation Sums of Squares df F p pe 


1 Between 22,98 19 
Within TOT67 40 
Treatment 200 2 AS) eG . 20 
Residual 9.77 38 
Total 33.60 59 
a Between 26.00 23 
veut 24.00 48 
Treatment 3508 2 4.04 “O02 06 
Residual 20.42 46 
Tatal 50.00 7a 
3 Between 35,65 29 
Within 2 iendid 60 
Treatment 2.82 2 e253 503 07 
Residual 22.51 58 
Total 60.99 89 


p* = Conservative probability of F which makes allowances for unequal 


covariances among correlated measures. 
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TABLE 41 


Unadjusted and Adjusted Reliability Estimates 


Factor 14: Self-Confidence 


SE A ee ee ee ee ee Se ee 
SLL aaa SS eee eee 


Clinician Source Unadjusted Adjusted 
Reliability Reliability 
ee ee RECTED = Se een Wek AES TEE 
1 Single Poa. Gab) .55 (R*1) 
i Pooled wo ORK) .79 (R*k) 
2 Single 30) (RD) .34 (R*1) 
2 Pooled 56 (Rk) .61 (R*k) 
3 Single .39 (R1) » Te4e Gre) 
3 Pooled .66 (Rk) .68 (R*k) 

TABLE 42 


Means and Standard Deviations 


Factor 14: Self-Confidence 


Clinician Rating Condition Mean Standard Deviation 
i Interview S570 .64 
uk Test 4.00 -89 
i Combined 3.85 ‘2. 365 
2 Interview 3.75 83 
2 Test Saeed 76 
2 Combined 3.94 81 
3 Interview 4.43 80 
3 Test +. 00 86 
3 Combined phe 75 
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Factor 15: Supervisory Effectiveness 


Supervisory effectiveness refers to "the individual's 
habitual effectiveness in directing, co-ordinating, and controlling 
subordinates in standard work settings" (Appendix 1). This factor 
is appraised by the California Psychological Inventory, Edwards 
Personal Preference Schedule, Management Aptitude Inventory, 
Supervisory Practices Test, Test of Practical Judgement and 
interview evaluations. 

As has been the case for many other factors, there is a high 
convergence of mean ratings within each clinician by rating condition 
cluster. Candidates appear to be rated on the same yardstick in 
each of the three rating eondirione: 

If, as noted above, candidates are being rated with the same 
eee in each rating condition, they are not measured identi- 
cally in each case. Individual case (R*1) reliability estimates 
for Clinicians #1 and #2 are only low to moderate (R*1l = .30 or .40). 
Clinician #3, however, is remarkably consistent in his ratings of 
supervisory effectiveness between rating conditions (R*l = .56). 

For him, the possibility of differing diagnosis as a function of 


assessment condition is reduced. 
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TABLE 43 


One Way Analysis of Variance with Repeated Measures 


Factor 15: Supervisory Effectiveness 


(GEG Saaeecies eeeeeeeeeeeEE——E—E—EeEeEeEEEE EEE EEE Ss eee 
SS a a a eee 


Rater Source of Variation Sums of Squares 
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p* 
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1 Between ZB aL8 19 
Within 20507 40 
Treatment 1.20 2 
Residual TOT 77 38 
Total 48.85 59 
eee oe ee ROUGE 8 8 a Oe ee ee re, ee 
2 Between 21578 23 
Within 20)n, 48 
Treatment 1.36 2 
Residual 13230 46 
Total 42.44 FAL 
3 Between 67.83 29 
Within 28.07 60 
Treatment 207 2 
Residual 28.60 58 
Total 96700 89 


p* = Conservative probability of F which makes allowances for unequal 


covariances among correlated measures. 
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TABLE 44 


Unadjusted and Adjusted Reliability Estimates 


Factor 15: Supervisory Effectiveness 


Clinician Source Unadjusted Adjusted 
Reliability Reliability 
a eR AE ol aot mah ES eR ee eee NGL 2 hl a IE 
1 Single .38 (R1) -40 (R*1) 
1 Pooled woo (RK) 767 (R3K) 
2 Single .29(R1) 30 (R*1) 
2 Pooled 55 (Rk) .56 (R*k) 
3 Single .56 (RL) .56 (R*1) 
3 Pooled Te BOM RE) .79 (R*k) 
TABLE 45 


Means and Standard Deviations 


Factor 15: Supervisory Effectiveness 


Clinician Rating Condition Mean | | Standard Deviation 
4 Interview 3.30 78 
di Test 3.65 1301 
2 Interview 2.87 Be 
2 Test 2nOe 76 
2 Combined WEEMS) od 
3 Interview 2.87 72 
3 Test 2.83 Leah 
3 Combined 2.80 1.08 
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Factor 16: Autonomy 

Autonomy is described as "the degree of the individual's 
need to make his own decisions, regulate his own behavior, be his 
own boss, etc." (Appendix 1). This factor is appraised by the 
Edwards Personal Preference Schedule, California Psychological 
Inventory and interview evaluations. 

Table 46 indicates that there is a similarity in mean ratings 
between rating conditions for Clinicians #1 and #2. Clinician #3, 
however, obtains a very significant F ratio indicating differences 
between rating conditions with respect to mean ratings. A Newman- 
Keuls multiple comparison between the means of the three rating 
conditions indicates that the mean of the interview condition is 
Significantly greater than the mean of the test and combined 
conditions. Test results appear to moderate interview evaluations 
very considerably when rating in the combined condition. 

Although there are significant mean differences between rating 
conditions for Clinician #3, we see from Table 47 that, once mean 
differences are removed as a source of error, his convergence 
estimate (R*1) is considerably higher than that observed for the 
other two clinicians. This points up the necessity of considering 
both mean differences and intra-rater reliability when discussing 


convergence. 
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TABLE 46 
One Way Analysis of Variance with Repeated Measures 


Factor 16: Autonomy 


nn 
SSS SSS SSS SSS SSS SSS 


Rater Source of Variation Sums of Squares df F p p* 


i Between 22.98 19 
Within 20.00 40 
Treatment 123 2 25 Scie) 28 
Residual LO76 38 
Total 42,98 59 
2°. Between 32565 23 
Within 28% 67 48 
Treatment 2.03 2 Wy AS: mks) 20 
Residual 26.64 46 
Total 614,32 ale 
3 Between 61.29 29 
Within 40.67 60 
Treatment 9.35 2 4.40 02 O4 
Residual 35. 31 58 
Total TOL OG 89 


p* = Conservative probability of F which makes allowances for unequal 


covariances among correlated measures. 
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TABLE 47 


Unadjusted and Adjusted Reliability Estimates 


Factor 16: Autonomy 


Te aa ec ee E  LE LIRE IE IEATE EL RA DEAT DE 
Sr pees ean ge ge ee eS ee 


Clinician Source Unadjusted Adjusted 
Reliability Reliability 
te ee ee ee ee ne ee 
al Single wooPCR I) .33 (R*1) 
Hs Pooled soo URK) .59 (R*k) 
2 Single eolCRI) .33 (R*1) 
2 Pooled .58 (Rk) 59 (R*k) 
3 Single »HIO( RI) -45 (R*1) 
3 : Pooled -68 (Rk) -71 (R*k) 

TABLE 48 


Means and Standard Deviations 


Factor 16: Autonomy 


Clinician Rating Condition Mean Standard Deviation 

it Interview 35935 oN 
Z Test sa2O ake 
a Combined 3.00 .89 
2 Interview 3.08 aoa 
2 Test idk .93 
2 Combined 2.75 : OF, 
3 Interview 3.37 1,05 
4 Test 22383 1.00 

2687 Wey Ons) 


3 Combined 
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Factoril]: Responsibility 

Factor 17 refers to "the degree to which the individual lives 
up to personal, professional, and business obligations he has tacitly 
or otherwise accepted" (Appendix 1). This is assessed by an inter- 
pretation of the California Psychological Inventory, Management 
Aptitude Inventory, Edwards Personal Preference Schedule and 
interview evaluations. 

Table 49 summarizes the F tests associated with Factor 17. 
The results of Clinician #1 indicate a marginally significant 
difference between the means of the three assessment conditions. 
However, for Clinician #1, a Newman-Keuls multiple comparison 
between mean differences indicates oe although the overall br 
ratio is significant, no individual difference between mean pairs 
is great enough to be considered significant. Clinician #3 also 
obtains a significant F indicating significant overall differences 
between groups. Further, a Newman-Keuls multiple comparison reveals 

a 

that the mean of the interview condition is greater than the mean 
of the combined dating condition. On referring to Table 51, it 
is surprising to note that the mean of the combined rating condition 
is lower than either of the test or interview rating conditions. 

Intra-rater reliability (R*k) estimates are also moderate for 
all three clinicians for Factor 18 being in the order of .30 to .40. 
Apparently, candidates are rated differently in the three rating 
conditions, but differences in rating made in a positive direction 


are nearly equalled by differences in rating made in the negative 
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direction (low F and low R*1). 
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TABLE 49 


One Way Analysis of Variance with Repeated Measures 


Factor 17:Responsibility 


reece 
SSS 


Se ES Oe 
Rater Source of Variation Sums of Squares df 1g 19) p* 


i Between 162277 19 
Within 16.67 LO 
Treatment 2.43 2 3.20 05 .09 
Residual 14.23 38 
2 Between 19.54 Za 
Within Ig. 33 48 
Treatment | 98 : aos 0 Scuil 
Residual 2S 46 
3 Between S552 29 
Within 37.33 60 
Treatment 6.75 2 6.41 .003 202 
Residual 30.58 98 
Total 72.45 89 


SS 


p* = Conservative probability of F which makes allowances for unequal 


covariances among correlated measures. 
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TABLE 50 
Unadjusted and Adjusted Reliability Estimates 


Factor 17: Responsibility 


Si atmos EEE EE ES SSS 
LE SS 


Clinician Source Unadjusted Adjusted 
Reliability Reliability 
<select a lec a et ace se EGE DSB 
7 Single 26 (R1) .30 (R*1) 
1 Pooled SLR) .56 (R*¥k) 
2 Single -41 (R1) ~41 (R*1) 
Ore Pooled .67 (Rk) .67 (R*¥k) 
3 Single -2e( RL) - 30 (R*1) 
3 Pooled -43 (Rk) -56 (R*k) 

TABLE +52. 


Means and Standard Deviations 


Factor 17: Responsibility 


Clinician Rating Condition Mean Standard Deviation 
1 Interview Sy Ae) 162 
aL Test Becks 5s 
aL Combined 3.30 TS 
2 Interview 3.37 63 
2 Combined 3.33 69 
3 Interview 3.43 1.02 
3 Test Bi (0) 66 
3 Combined 2.77 84 
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Factor 18: General Suitability 

Factor 18 is described as a "self explanatory" rating 
(Appendix 1) in that it refers to the overall suitability or overall 
rating of a candidate. It could be likened to the measure of 
general intelligence in intellectual assessments in that a composite 
is presumed. 

Statistics associated with Factor 18 are somewhat alarming. 
Only Clinician #3 obtains a significant F and a further Newman-Keuls 
multiple comparison indicates that the mean of the interview 
assessment condition is significantly greater than the mean of either 
of the other two rating conditions. 

It is the low R*l values which disclose the most about the 
rating of this factor. Values range from a low of .05 to a high 
of .33, the lowest seen for any factor. This indicates considerable 
intra-individual ranking differences. Candidates are not viewed 
as uniform with respect to their overall suitability across rating 


conditions. 
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TABLE 52 
One Way Analysis of Variance with Repeated Measures 


Factor 18: General Suitability 


CRSRASAS SSS epeimmecemeeamneenemmerae a a a Se a ee ee we 
a mmm a AS 9. Se Fi A SG SRE Lee a an RS 


Rater Source of Variation Sums of Squares df F Pp p* 


1 Between aly ch 19 
Within 26.67 40 
Treatment 2 oaks 2 L265 20, Spal 
Residual 24.53 38 
Total 40.98 59 
See OER ner ger ae ne SN ee ee lg OP ae ae ee 
2 Between 24.61 23 
Within 30.00 48 
Treatment 3.36 Z 2590 70.0 adhe) 
Residual 26.64 46 
ee ei oh Tota gh pw eer So ee ee ea ee ee 
3 Between 2765 29 
Within 26.00 60 
Treatment 3.49 2D 4.49 01 O4 
Residual 22.91 58 
Total 5365 89 


p* = Conservative probability of F which makes allowances for unequal 


covariances among correlated measures. 
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TABLE 53 
Unadjusted and Adjusted Reliability Estimates 


Factor 18: General Suitability 


ee I 
re EEE 


Clinician Source Unadjusted Adjusted 

Reliability Reliability 
uu Single | -04 (R1) 205 (Re) 
iL Pooled - 12. (Rik) 14 (R*k) 
2 | Single ~2 (RL) .22 (R*1) 
2 Pooled 42 (Rk) -46 (R*k) 
3 Single -29 (R1) -33 (R*1) 
3 Pooled -59 (Rk) -59 (R¥k) 

TABLE 54 


Means and Standard Deviations 


Factor 18: General Suitability 


Clinician Rating Condition Mean Standard Deviation 
uy Interview 3.45 74 
My Test BOGS Lae2 
if Combined 3205 ek!) 
2 Interview SELF 262 
2 Test Pat fe 89 
2 Combined 7g al .98 
3 Interview 3.40 .66 
3 Test 3.00 82 
3 Combined 2.97 15 
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Inter-Rater Reliability: Test Condition 

As an added precaution against the possibility of clinicians 
remembering test profiles already used in the combined rating 
condition when they were rating in the test condition, all clinicians 
rated all 74 test profiles (their own plus those of the other two 
clinicians). As noted in Chapter 3, each profile was rated 
individually, in random sequential order, without identifying 
demographic information. A side effect of using this blind rating 
approach is that it is possible to see how closely the three 
clinicians involved in this study rated the same profiles; i.e., it 
is possible to obtain a measure of the inter-rater reliability 
(concensus) of test condition assessment decisions as well as the 
intra-rater reliability estimates already presented. Inter-rater 
reliability is important because it can give us some idea of the 
consistency of three clinicians in rating similar profiles under 
similar situations. If inter-rater reliability is low, the problem 
of good predictions is further complicated. Not only would 
differences in rating situations be important in so far as the actual 
rating is concerned, but the rating made would also be extremely 
clinician-dependent. Although in this study, and in most real life 
assessment situations, a candidate is usually rated by only one 
clinician, it is interesting to note how much of the rating given 
is "clinician-dependent" and how much is "'clinician-independent" 
with respect to assigned value. This says nothing about validity 


however, since high consistency does not necessarily lead to 
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prediction accuracy. 

The inter-rater reliability indices of the three clinicians 
ratings done in the test rating condition for all 74 subjects on 
17 factors are summarized in Table 55. Factor 3 is not presented 
since it will be recalled that Clinician #3 did not rate oral 
communication in the test condition mode. 

It is seen that the Rl inter-rater reliability estimates 
vary from a low of .19 to a high of .89 with a mean value of .62. 
With the exception of the R*1l value of .19 for Factor 12, all 
reliability estimates 2 .50. Factor 12, as was evident from 
Tables 34-36, was a factor with which all clinicians had difficulty 
in cross-group ratings; intra-rater reliability indices were also 
very low. It seems that, even with a single category of informa- 
tion, clinicians differ in their interpretation of ''general energy 


level" and/or how it is measured via psychometric profiles. 
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TABLE 55 


Unadjusted and Adjusted Inter-Rater Reliability Estimates 


Test Rating Condition 


Factor Unadjusted Reliability (R1) Adjusted Reliability (R*1) 
AL Aer . 66 
2 Pa a . 80 
4 ately) 09 
5 48 49 
6 Aloyy/ 369 
i .66 209 
8 65 Bole) 
9 SONS Ane) 

10 Ow yee) 
ry 287 Sieiel 
12 male. pee 
13 Boke Ao o 
14 al £50 
10S) 200 Bole, 
16 eile a8 
ay) Bole Aoi 8) 
18 04 Arche: 
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Factor Analysis of Test Condition Ratings 

A factor analysis of each clinician's ratings on the 18 factors 
for all candidates (N = 74) rated in the test condition was under- 
taken. This procedure was deemed useful to assist in a discussion 
of the results just presented. It was thought useful to examine 
clusters of similar factor ratings made between candidates to 
establish possible communalities of ratings. It seems likely that, 
although factors have been presented as semantically and construc- 
tually idiosyncratic (Appendix 1), there are common ratings made 
on an individual between factors, i.e., ratings may be mutually 
interdependent. | 

With this in mind, a principal axis factoring with varimax 
orthogonal cavarion was attempted with the results obtained from 
each clinician's rating of the 74 candidates on the 18 factors in 
the test condition. Since this factor analysis is not central to 
this results section, findings are detailed in Appendix 6 and are 
presented here in summary form only. 

Using a earecehende Rae value = 1.00, each clinician's 
original ratings based on 18 factors were found to load significantly 
on five major factors. The percentage of total variance of the 
original 18 factors accounted for by the five new factors ranges 
from 71% for Clinician #1 to 60% for Clinician #3. 

The results of the factor analysis are presented individually 
by clinician. A tentative descriptive title for each of the five 


prime factors for each of the three clinicians is typed in brackets 
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a 
immediately following the factors which appear to load significantly 


on that factor. 

Factor Loadings 
eliniciang#l 
FACTOR I: General intelligence + adaptability + potential for growth + 
readiness to learn + management level planning and problem solving. 
Total variance accounted for = 22% (INTELLECTUAL POTENTIAL). 
FACTOR II: Oral communication + leadership force + interpersonal 
effectiveness + self-confidence + supervisory effectiveness. Variance 
accounted for = 18%. (INTERPERSONAL FORCEFULNESS). 
FACTOR III: Efficiency of application + responsibility + general 
suitability. Variance accounted for = 17%. (RESPONSIBLE EFFICIENT 
WORK STYLE). 
FACTOR IV: Self-starting work drive + general energy level. 
Variance accounted for = 9%. (WORK DRIVE). 
FACTOR V: Common sense + self-reliance + autonomy. Variance 
accounted for = 9% (RESOURCEFULNESS). 
Total variance accounted for by factors I - V = 71%. 
Clinician #2 
FACTOR I: General intelligence + oral communication + potential 
for growth + readiness to learn + management level planning and 
problem solving + adaptability. Variance accounted for = 22%. 
(INTELLECTUAL POTENTIAL). 
FACTOR II: General energy level + responsibility + general suitability. 


Variance accounted for = 15%. (DIRECTED ENERGY). 
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FACTOR III: Self-reliance + self-confidence + autonomy. Variance 
accounted for = 13%. (RESOURCEFULNESS) . 

FACTOR IV: Common sense + interpersonal effectiveness + supervisory 
effectiveness. Variance accounted for = 10%. (INTERPERSONAL FORCEFUL- 
NESS). 

FACTOR V: Self-starting work drive + efficiency of application. 
Variance accounted for = 8%. (GOAL DIRECTED WORK DRIVE). 

Total variance accounted for by factors I - V = 68%. 

Clinicians #3 

FACTOR I: Leadership force + general energy level + self-confidence + 
supervisory effectiveness. Variance accounted for = 14%. (DYNAMIC 
LEADERSHIP). 

FACTOR II: Adaptability + potential for growth + readiness to 

learn + management level planning and problem solving. Variance 
accounted for : 14%. (POTENTIAL ABILITY). 

FACTOR III: Self-reliance + efficiency of application + responsi- 
bility + general suitability. Variance accounted for = 13%. 
(RESOURCEFULNESS) . 

FACTOR IV: General intelligence + common sense. Variance accounted 
for = 9%. (PRACTICAL PROBLEM SOLVING). 

FACTOR V: Self-starting work drive + autonomy. Variance accounted 
for = 9%. (INDEPENDENT WORK STYLE). 


Total variance accounted for = 60%. 
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Summary 
Clinician #1 
Factors with significant differences between means: 4, 10, 17 
Mean R*1 value for all 18 factors= .41;. standard deviation = .14 
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Factors with R*1 (6lese 00x 9SS07 
Clinician #2 
Factors with significant differences between means: 2, 5, 7, 10, 14 


Mean R*1 value for all 18 factors =.37; standard deviation = .15 


Factors with R*l 
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Clinician #3 

Factors with significant differences between means: 4, 85 9, 10, 11, 
Voemmiderey 17 as 

Mean R*1 value for 17 factors (excepting #3) =.38; standard deviation = .12 
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CHAPTER V 
DISCUSSION 

In this chapter, results pertaining to each of the clinician 
by factor by rating condition interactions will be discussed. 
Common themes will be examined by clinician and factor, an attempt 
will be made to explain significant results, the utility of 
convergence as a psychological construct will be examined, and 
suggestions for further research will be detailed. 

Within the context of this study, convergence is probably 
best viewed as a condition affected by both mean differences and 
reliability estimates. It is possible to err with the ratings made, 
doth from the point of view of the actual rating assigned to a 
aendidate within any rating condition, and from the perspective 
of differences in rating of a candidate made across conditions. 
The first difference, which is often described as mean pegging error, 
can be considered to be a constant. Errors of this type would 
result in comparison errors when inflated scores from one group 
are compared to deflated scores from another. This type of error 
is unlikely to result in errors when considering an individual 
within any group since rankings are not changed (each person is 
being measured with the same, albeit incorrect, yardstick). Mean 
pegging errors are also very easy to correct since changing all raw 
rating scores from all groups to standard scores will standardize 


between groups. 


From a consideration of the tables in Chapter 4, it is evident 
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that, even when the difference between rating condition means is 
very statistically significant, (e.g., Clinician #3 on Factor 8), 
the actual numerical differences between the significantly different 
mean-pairs is not great. Thus, from observing the tables in 
Chapter 4 once again, we see most of the raw score differences 
between mean-pairs that are significantly different are in the order 
of .50 - .80. Since actual ratings made on candidates are in whole 
numbers within the range 1 - 5, it is unlikely that differences 
across rating conditions for any candidate would exceed one. Thus, 
significant differences between the means of the ratings made in 
each category may not reflect practical differences between ratings 
from the perspective of actual judgements made about that candidate. 
What will be said differently about a candidate who scores 4 on a 
factor versus that said about a candidate who scores 5? 

Low R*l1 estimates are a reflection of low concurrence in 
subject ratings across rating conditions once mean error has been 
removed. Low R*l values should be viewed more seriously than high 
F values since they cannot be eliminated by anything as simple as 
a standard score transformation. 

Low R*1l values may be thought of as reflecting either or both 
of two possibilities: (1) basic clinician decision error of the 
type noted in the equation, TRUE SCORE = OBTAINED SCORE + ERROR or, 
(2) real differences inherent in the information available about 
a candidate as a result of sampling in either of the three appraisal 


conditions which would cause even a totally accurate clinician to 


oot 


ai 2n5om aottibnoo at 906 
(8 xotosi a0 ee Paeiriey cae nee titel ' 
tasrsttib yulsasoitiogia ott avowted estnoaeee Leo tremit Leas, 
ar esidet sdt pitivrsedo nov g2udT .tsexg tom at valeg-anon 
asomeqstiib srose war oft 40 taom 955 9W , aise sofa # nergedo 5 
sebro off mt 515 tmowitib yitesofitagia em tadt evisa-asom meowsed 
slodwiat sis aetebrbmso no eben Seer frvsee eonke «08. ~0BHAG 
2someistt ib teas visitas ek Jt oe L syest oft abdtin’ esodmumn 
~2uniT .eno basoxs Biuow atebibaso Yas 08 amobsibaes gaitey aeomB . 
at ebpr agdizes oft 26 Bassat ott hoswted-eoguendy aie tasoitingie 
agibtsy neswisd sanqeTetiib [soitosyq tosfter tom ysm ysogstso dose 
.esebtbaes gedt tuods sbam ‘etmomsgbyy, Devdas to ovispaqeusg ott mont 
s fo # estooe oflw otebi base s tuods yitnorsttib bise sd Lie tah 
“ed ‘aetesavede siahinmed 5 tuods bise tadz cvs name 
ai sonsyiones ee nottosList ‘ ets astamites Dt wok | 
nesd esi torres mon sono enoitibgos RTitst SeorxSs eunitsy -tostdue ; 
dgid osdt ylauoiaee stom bewsiv sd roads aot sot lev £44 wol -bevoust 
26 siqmic es antdtyns yd betemimife od soansp yet sonia aie * 
| snoitsmrotensst satooe nae 
dteod yo szsdtie guitoslis:y 26 to sdguodt od yem eeulay LH wou erie, 
eit te soars nolgiosh asiotails otesd (L) sap sihdtoeat om te 
xo OAKS + AA002 CANTATAO = ALOIS QUAT <nolteups eit at betomiegy® 
twode eldsiisvs noitemiotatr sds ak tosmrodderd PE 
‘Esekssqqs ssuis odt 20 vodtie ai gailqmee to tivesa 6 alaimeuents! | 
of msioinilo etsuvoss yilstot 6 neve seuso biwow dokdw 


101 


diagnose differentially dependent on assessment condition (i.e., 
real differences in the quantitative information available about a 
candidate). The first possibility is that referred to by 
researchers such as Little & Scheidman (1959) or Goldberg & Werts 
(1966). The second has been ignored in the literature. 

It is this second possibility that is most frustrating for 
the researcher - and so face-saving for the clinician! It may be 
that differences in intra-rater reliability are differences, not 
due only to clinician error, but in differences in the ability 
assessed in each condition. This could also be thought of as a 
construct difference between factors which bear the same name in 
each of the three conditions. It may never be possible to separate 
these two types of "error", but it is wise to keep them in mind, 
particularly when discussing intra-rater reliability. 

It seems logical to presume that, when all three clinicians 
obtain high R*l values on the same factor, both types of error 
would be minimized. Similarly, when one or two clinicians obtain 
a high R*1 value on any factor, it is tenable to assume that the 
lower R*1 value of the other clinician(s) on the same factor 
reflects judgement errors (type 1) rather than real differences in 
the level of ability assessed by different methods (type 2). 

One would assume that the second type of error would be a constant 
between and within any given clinician by factor cluster with R*l 
scores which are lower than the highest R*l value obtained by any 


of the three clinicians being due to clinician decision error. 
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It is also logical to assume that, when R*l values are low for 
all clinicians, or when there is considerable variability between 


the R*1l values of each clinician on the same factor, that both types of 


error are greater although the relationship between the magnitudes 
of the two types of error is indeterminate. It should be noted that 
inter-rater reliability estimates (Table 55 ) include only error 

of the first type (judgement error), since the same sources of 
information were available to all clinicians. This would be the 
essential difference in the interpretation of intra- versus 
inter-rater reliability estimates. With these different types of 
error in mind, let us examine intra- and inter-rater reliability 

in the present study. 

Let us assume for the present that an acceptable level of 
intra-rater reliability would be approximately .50. ia actual 
fact, the choice of any criterion value is always arbitrary 
representing a compromise between practical limitations and stati- 
stical desirability. With an R*l value equal to approximately .50, 
we would assume that roughly 50% of the variance of any single 
estimate of any factor represented true variance with the remaining 
50% being due to error of various types. Although the choice of 
.50 as a criterion value may appear somewhat lenient, it is 
realistic given the differences between statistical and practical 
significance vis-a-vis score assignment differences previously 


discussed. 


If one examines each clinician's R*l estimates across all 
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18 factors, we see that there are 12 factors where at least one 
clinician obtains an R*l approximately equal to .50. These factors 
are Factor 4 (self-starting work drive), Factor 6 (leadership 
force), Factor 7 (self-reliance), Factor 8 (adaptability), Factor 9 
(potential for growth), Factor 11 (management level planning and 
problem solving ability), Factor 13 (efficiency of application), 
Factor 14 (self-confidence), Factor 15 (supervisory effectiveness), 
Factor 1 (intelligence), Factor 2 (common sense), and Factor 10 
(readiness to learn). In several cases, two or even three clinicians 
obtain these criterion R*l values for the factor noted. Factors 
where no clinician achieves a criterion R*l value are Factor 3 
(oral communication), Factor 5 (interpersonal effectiveness), 
ae 12 (general energy level), Factor 16 (autonomy), Factor 17 
(responsibility), and Factor 18 (general suitability). For these 
factors, both types of error would be considerable. 

As noted earlier, it is tenable to consider that, for the 
factors where one or more clinicians obtains an R*l value approxi- 
mating .50, the difference between this value and the R*1l value 
obtained by the other clinician(s) on the same factor is comprised 
primarily of clinician judgement error (type 1) rather than essential 
differences in the levels of ability measured (type 2). Each 
clinician is availed the same types of information about each of 
the candidates to be appraised as is every other clinician. 
Therefore, errors of the second type would be presumed to be a 


constant for all clinicians; possibly large, but still a constant. 
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If one clinician is able to obtain an R*l value at a criterion level, 
it seems likely that the other clinicians could have also obtained 
that level save for their additional degree of clinician error. 
It should be recognized that, even for a clinician who obtains a 
criterion R*l value, his ratings still consist of some portion of 
both types of error. 

Another indication of the contribution of the two types of 
error to the convergence indices is in the relationship between 
*1 values across clinicians in one rating condition (inter-rater 
reliability; test condition). As noted earlier, inter-rater 
reliability estimates suffer only from the first type of error 
whereas intra-rater estimates include both types. If the R*1 
inter-rater value is high, but yet all of the R*l intra-rater values 
are low, one would presume a fair measure of the second type of 
error (trait difference) is present. 

In this regard, we see that Clinician #1 achieves a criterion 
ReDSvalueten Factors 1je4gre, 7, o.. 1, and 24, <or, one7*outrorprne 12 


No 


factors on which any clinician obtained a criterion R*l value. 
Clinician #2 obtains a criterion R*l value on Factors 4, 6, 8, 9, 
lO 11, or\on 6 out of the 12°factors: ‘Clinician *#3°obtains a 
criterion R*l estimate on Factors 2, 6, 11, 13, 15 or on 5 out 

of the 12 overall factors. Mean R*l estimates differ only slightly 
between clinicians: .41, .37, .38. 


Factors where one or more clinicians do not achieve a criterion 


R*1 value, but where at least one clinician does, are, for Clinician #1 
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Factors 2, 8, 10, 13, 15; for Clinician #2: Factors 1, 2, 7, 13, 
14, 15; and for Clinician #3: Factors 1, 4, 7, 8, 9, 10, 14. 

Let us examine some areas where low R*l estimates were 
obtained. From what has already been said, we see that interpretive 
problems were of two main types. For individual clinicians, these 
would be factors on which one or more clinicians did not achieve 
a criterion R*l value even when a criterion R*l estimate was 
obtained by at least one other clinician on that same factor. 
Interpretive problems for all clinicians collectively would be 
factors on which no clinician achieved a criterion R*l estimate. 

Interpretive Problems: All Clinicians Collectively 

It was previously noted that R*l estimates were below 
criterion for all three clinicians on Factors 3 (oral communication), 
5 (interpersonal effectiveness), 16 (autonomy), 17 (responsibility), 
and 18 (overall rating). With the exception of Factors 17 and 18, 
all of the factors noted above are of the interpersonal, oral 
persuasiveness type (Appendix 1). It may be that these interper- 
sonally oriented factors are only poorly or differentially appraised 
by psychometric and/or interview means, a possibility raised by 
Hendricks (1969). It may also be tenable that, since several 
tests or subtests are integrated by the clinician in rating any 
single factor, differences across test evaluations on the same 
candidate are of concern, a possibility which would explain Little 


& Schneidman's (1959) study. 


The error inherent in the low R*l estimates for all 
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clinicians on these factors might be thought of as reflecting more 
appraisal (type 2) error rather than clinician (type 1) error. Real 
differences in the levels of ability may have been present which 
would require that a "perfect" clinician obtain a low R*l value in 
order to reflect this actual difference. 

Factor 17 (responsibility) and Factor 18 (overall estimate) 
are the two other factors on which no clinician obtained a criterion 
R*1 value. The problems in interpreting these two factors are 
similar. What is being measured varies from condition to condition, 
or, in the case of Factor 18, within conditions. In the interview 
condition, it is logical to assume judgements of responsibility 
were based on past performance and quite possibly interpersonal 
persuasiveness. In the psychometric condition, personnel tests, 
which largely measure personality characteristics, were used. 

The differences between what is seen (interview) and what is seen 
to be seen (test) could account for this difference. 

With Factor 18, this problem is complicated since clinicians 
noted that they had difficulty in separating their evaluation in 
terms of suitability for a particular job versus their evaluation of 
suitability in terms of all candidates seen. This difficulty is 
reflected in the low R*1 values for all clinicians, particularly 
Clinician #1. 

Interpretive Problems: Individual Clinicians 
Clinician #1 


It was previously noted that Clinician #1 obtained R*l estimates 
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below criterion on Factors 2 (common sense), 8 (adaptability), 

10 (readiness to learn), 13 (efficiency of application), and 15 
(supervisory effectiveness) even though at least one other clinician 
obtained criterion R*l estimates on these factors. Since at least 
one clinician does obtain a criterion R*l value, it seems likely 

that we are dealing with increased clinician error (type 1) on 

these factors. On examining these interpretively difficult factors 
in the light of the factor analysis already described, it is reassuring 
to note that they are spread over 4 of the prime 5 factors. For 
purposes of interpretation and evaluation then, this would appear 
better than if these were clustered within one prime factor rendering 
this factor oe for prediction purposes. 

Once again, an examination of the raw data reveals that Santich 
is a candidate ranked differently than one point between rating 
conditions. It may be that the measure R*l is too sensitive given 
the meaning and purpose of the ratings. 

These factors all have a common description (Appendix 1) 
in ee they are concerned with applied, concrete operations which 
may be difficult to assess in an interview setting; i.e., prediction 
of on-the-job applied skills. 

Clinician #2 

As indicated earlier, Clinician #2 obtained below criterion 
scores on Factors 1 (intelligence), 7 (self-reliance), 13 (efficiency 
of application), 14 (self-confidence), and 15 (supervisory effec- 


tiveness) even though at least one or more clinicians reached 
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criterion on these factors. Type 1 (clinician error) would be 
presumed to be higher on these factors than it would be on those 
factors where a criterion R*l value was obtained. 

As was the case with Clinician #1, those factors on which a 
low R*1 value was obtained are spread over most of the five prime 
factors isolated by factor analysis rather than being clustered 
wholly within one prime factor. Factor III (resourcefulness), 
however, does contain two of these low R*l factors which might 
render cross-condition spedacetone rather tenuous. From an obser- 
vation of the raw data, it is also apparent that seldom does an 
actual ranking difference between rating conditions exceed one for 
any of these factors. This further reduces the risk of actual 
differences in behavioral predictions based on numerically assigned 
differences between rating conditions. On further examining these 
factons “in*therlight of the definitions given in Appendix 1, it is 
evident that they fall into two general areas; intellectual ability 
and independent self-directed work style. 

Clinician #3 

Clinician #3 obtained below criterion R*1 values on Factors 1 
(intelligence), 4 (self-starting work drive), 7 (self-reliance), 

8 (adaptability), 9 (potential for growth), 10 (readiness to learn) 
and 14 (self-confidence). As was the case with the other two 
clinicians, the low R*1 factors are spread across all of the five 
major factors isolated by factor analysis, with the exception of 


Factor II (potential ability) which includes three of them. 
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Commonalities 
SOEMOMaALLCLeS: 


When looked at individually, it seems that clinicians had the 
most difficulty in rating convergently factors concerned with 
inner-directedness, work style and application, future potential 
from the perspective from learning and application, and applied 
intellectual problem solving. 

Conclusions and Indications 

1. Convergence across rating conditions is a far more elusive 
standard than is convergence within rating conditions. A comparison 
of inter-rater reliability indices with any of the intra-rater 
indices shows this very clearly. Logically, this is so because 
of the oye different types of error discussed at several points. 

2. With no exceptions, the most reliable indicator for 
purposes of analysis or prediction is the simple arithmetic mean 
of the nee independent ratings for any given factor. In most 
cases, this raises the reliability index by .20 or .30. 

3. On looking at the similarity between petined across 
conditions (F and R*1), one can seriously question the value of the 
interview technique as an evaluation tool. Combined ratings, which 
are the ones actually used for prediction in the organization, most 
closely resemble those of the test ratings. Interviews are 
expensive and seem to contribute only inconsistency; this being 
aside from their obvious public relations function! This is in line 
with Webster's (1964) findings. 


u. Differences in clinical decisions made by individual 
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clinicians vis-a-vis prediction cannot be discounted. It is necessary 
to look at, not only how well a candidate is predicted to perform, 

but who is making that prediction as well. If one combines the 

best ratings of all clinicians, the power of our "super clinician" 

is tremendous. If one combines the worst ratings. . 

5. Reliability, as it has come to be referred to in the 
literature, is not an adequate construct to use in comparing 
ratings across conditions. Even though we know that we have two 
distinct sources of error, we act as if we have only one by 
clinging to a traditional conceptualization. 

6. Although clinicians tend to look at the same general 
prime areas for purposes of personnel evaluation (factor analytic 
interpretation), the differences are in the weighting of these 
factors for decision making. 

7. Differences in mean ratings across conditions (high F) 
do not necessarily lead to differences in reliability (R*1) or 
vice versa. The consideration of either aspect singly is folly. 

8. Factors of an interpersonal oral persuasiveness nature 
tend to be differentially rated by all clinicians. 

9. As noted by Bray & Grant (1966), all factors are utilized 
for decision making but some contribute a great deal more in terms 
of weighting. 

10. Most of the key characteristics can be evaluated by 
interview (Grant & Bray, 1969), but that evaluation is often diffuse 
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ll. Even though clinicians differ widely in the amount of 
experience they bring to this study, there is little apparent 
difference in level or style of decision making. This is in accord 
with Goldberg (1970) and Stricker (1967). 

12. Sawyer (1966) may be correct in describing the main use 
of the interview as in providing additional, non-psychometric 
information to the evaluation process. However, when that information 
is processed psychometrically, it does not concur with other ratings 
of similar abilities. 

13. In all cases where mean differences between rating 
conditions were noted, test results appeared to moderate interview 
impressions when rating in the combined condition. 

14, Perhaps Holt (1970) was most correct of all when he 
said "...¢Glinical psychologists vary considerably in their ability 
to do the job, but the best of them can do very well " (p. 348). 

Suggestions for Further Research 

1. The most major suggestion must be in the area of predic- 
tive validity. Even though we now know a great deal regarding the 
reliability (convergence) of clinical judgement, what is the 
predictive validity of judgements made in a cross-condition rating? 
Which rating condition best predicts future on-the-job behaviors? 
This would, of necessity, be a longitudinal study since only 
approximately 20% of the subjects involved in this study actually 
became employees of the companies for whom they were appraised. 


Because of the difficulties of comparing supervisor ratings 
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(criterion) with clinicians ratings of future performance, it would 
be hard to maintain the same degree of ecological or external 
validity attained in this study. 

2. It would be interesting and worthwhile, if any predictive 
validity study were undertaken, to ascertain the factor or factors 
(of either the 18 by 18 matrix or the 5 by 5 matrix) which either 
singly or in linear combination would best predict future performance. 
This study would be plagued by the same external validity problems 
as would #1 above, but is very important research. 

3. Although there are small differences between the results 
from each clinician, clinician sample size is too small to make 
any generalizable conclusions. A larger clinician sample might 
address itself to problems of clinical judgement, especially areas 
such as amount of professional training and amount of experience, 
problems raised by Goldberg (1970), Stricker (1967), Borke & Fiske 
(1957) and Oskamp (1965). It may be that as much would be lost 
as would be gained in this type of procedure vis-a-vis external 
validity. If a large number of clinicians were used, many clinicians 
would be called on to do tasks that they do not normally do because 
of experimental convenience. 

4, Judgement simulation, while possible with the present 
data, was not a central aspect of this treatise. What is the possi- 
bility of linearly or exponentially combining single decisions 
in order to predict other decisions? Need we have a clinician at 


all or would we be better at simulating a clinician when he is at 
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his best or most consistent? Holt (1970) addresses these problems 
but not in any rigorous experimental sense. 

S. How might psychological trainees be trained to simulate 
or duplicate the decisions of our three "experts"? Would such a 
procedure be viable or desirable? Stricker (1967) would see this 
as feasible but what would be lost by such an approach? 

6. The two threats to reliability noted in this study merit 
examination from conceptual and practical perspectives. 

7. Replication of this study (or aspects of it) on a non- 
industrial clientele would contribute greatly in the area of 
generalizability. 

8. What is the generalizability of the factor analytic 
combination of the 18 by 18 matrix? What is the effect of feedback 
about your own decisions on future decisions? 


9. What is the effect, in terms of actual behavioral pre- 


diction, of ratings differing only by one point? Are our statistics 


too powerful for our procedures? 
10. What is the cost effectiveness or utility of the various 
approaches? What is gained by the three approaches and Sean 


worth the price? 
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Definition of the 18 Characteristics Used in the Study 
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APPENDIX I 
FACTOR DEFINITIONS 

General Intelligence. Basic general capacity to learn and 
understand. 

Readiness to Learn. The individual's willingness to acquire 
new information, explore new ideas, methods, tasks, etc. 

Common Sense. The degree of ability to reach quick, practically- 
effective decisions about uncomplicated situations where sound 
judgement depends primarily on accumulated life and work experience, 
established precedent and procedures, etc. 

Management-Level Planning and Problem-Solving. The individual's 
ability to recognize the full depth and breadth of situations and 
problems and to consider the longer-range, as well as the here-and-now, 
consequences of their change or resolution. 

Oral Communication. The degree of clarity and ease with which 
the individual expresses himself in face-to-face discussion. 

General Energy Level. The level of physical vigor and 
vitality the individual will demonstrate in his day-to-day conduct. 

Self-Starting Work Drive. The degree to which the individual 
characteristically keeps himself continuously occupied in work- 
related activities without need of stimulation from his supervisor. 

Efficiency of Application. The economic and productive 
organization and application of work time and effort. 

General Interpersonal Effectiveness. The level of effective- 


ness the individual demonstrates in day-to-day dealings with others 
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with regard to gaining and maintaining their respect for his ideas 
and opinions, their confidence in his integrity, and their general 
feelings of good will. 

Self-Confidence. The degree of basic security the individual 
feels in his own ability to deal adequately with more situations 
and people he encounters. 

Leadership Force. The amount of influence and dominance the 
individual habitually exerts over groups and persons he encounters. 

Supervisory Effectiveness. The individual's habitual effec- 
tiveness in directing, co-ordinating and controlling subordinates 
in standard work settings. 

Self-Reliance. The degree to which the individual carries 
Out assigned responsibilities without Seeking direction, help, 
encouragement and/or reassurance from co-workers. 

Autonomy. The degree of the individual's need to make his 
own HES snes regulate his own behavior, be his own boss, etc. 

Adaptibility. The level of ability to cope comfortably with 
new and changing circumstances. 

Responsibility. The degree to which the individual lives up 
to personal, professional and business obligations he has tacitly 
or otherwise accepted. 

Potential for Growth. The degree of probability that the 


individual will develop the personal resources to cope with increasingly 


» 


ella - wok , 


| ae wi io a 2 £ Le TC 4 
) ea. 7 7 7 
Isrefieg” i git ak 
. 


Isubivibat at. ytiauose otesd 3 


anoitsutie stom dtiw ae eu: Ve pitde 


.etetnvoots ef efeatsq bas aan ae vive 


-99Tts Isutided 2! isubivibat sit: 4889 vie 
estenibioda gaiifoitros bas anitsnibto-oo jaatioedipnd 


eainies Isubivibat edt doidw oF panel one, mnt ; a 


-qisd ,noitoenth grpisse SUCH tM acitilidisacqzes & tO 
a Pyne” 
x Dg —steee tual a 


2id sien ot beem 2'feubivibar sat to sexgab oT. _ | 
soph ys 
.ots ,.ae00d gwo eid sd ri eae wo. eid SteLuger eencietoed a nor 
dtiw yidsivetmos ages oF ytilids 3o Level sit nena 
pb aie 


~erstrow-0D mort sonswwaaset fo\bas ti 


sit teit yoiildsdesy to ants bul otk os 10% meth ‘ 
yigatuseront dtiw: sqo0 oF 2esomuoeer I 


a7) 2 a ee Bo are ah 


124 


more complex and responsible work roles. 


General Suitability for Job Concerned. Self-explanatory. 


125 


APPENDIX 2 


Interview Rating Form 


© XIGUGSTA 


miol gaitgh weivastal 


CANDIDATE'S NAME 


CANDIDATE'S AGE 
ASSIGNMENT NUMBER NAME 
DATE 


RATER'S NAME 


Rate the candidate on each of the following characteristics according to 
the following code. Place the number that represents the most correct 
description in the space provided opposite each characteristic. 

1 = Poor; 2 = Marginal; 3 = Adequate; 4 = Good; 5 = Very Good 

If you are genuinely unable to rate a candidate on a characteristic, 


leave the space opposite that characteristic blank. 


1. General Intelligence  __ 10. Readiness to Learn pees 
2. Common Sense rs 11. Management Level Planning ae 
3. Oral Communication ae 12. General Energy Level ae 
4. Work Drive ae 13. Efficiency of Application ee 
3. Interpersonal /bffect. 14. Self-Confidence see 
6. Leadership Force eee 15. Supervisory Effectiveness oe 
7. Self-Reliance eee 16. Autonomy Be 
8. Adaptability ta hek 17.. Responsibility ee 
9. Potential for Growth 18. General Suitability 


INTERVIFY RATING FORM 
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APPENDIX 3 


Interview + Test Rating Form 


CANDIDATE'S NAME 


CANDIDATE'S AGE 
ASSIGNMENT NUMBER NAME 


DATE 


RATER'S NAME 

Rate the candidate on each of the following characteristics according to 
the following code. Place the number that represents the most correct 
description in the space provided opposite each characteristic. 

1 = Poor; 2 = Marginal; 3 = Adequate; 4 = Good; 5 = Very Good 

If you are genuinely unable to rate a candidate on a characteristic, 


leave the space opposite that characteristic blank. 


1. General Intelligence 10. Readiness to Learn ae 
2. Common Sense _ ee ll. Management Level Planning eee 
3. Oral Communication ete 12. General Energy Level ee 
4. Work Drive Rae ha 13. Efficiency of Application a 
5. Interpersonal/Effect. 14. Self-Confidence oe. 
6. Leadership Force ns 15. Supervisory Effectiveness pote ae 
7. Self-Reliance as 16. Autonomy a 
8. Adaptability ae 17.. Responsibility eae 
9. Potential for Growth 18. General Suitability 


INTERVIFW + TEST RATING FORM 
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APPENDIX 4 


Test Rating Form 
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CANDIDATE'S NAME 


CANDIDATE'S AGE 
ASSIGNMENT NUMBER NAME 


DATE 


RATER'S NAME 


Rate the candidate on each of the following characteristics according to 
the following code. Place the number that represents the most correct 
description in the space provided opposite each characteristic. 

1 = Poor; 2 = Marginal; 3 = Adequate; 4 = Good; 5 = Very Good 

If you are genuinely unable to rate a candidate on a characteristic, 


leave the space opposite that characteristic blank. 


1. General Intelligence 10. Readiness to Learn eee: 
2. Common pense at ll. Management Level Planning pany 
3. Oral Communication oe. 12. General Energy Level ——s 
4. Work Drive patce : 13. Efficiency of Application aaa 
S. iInterpersonal/Effect. 14. Self-Confidence Tue 
6. Leadership Force eae 15. Supervisory Effectiveness nae 
7. Self-Reliance wae 16. Autonomy aoe 
8. Adaptability ras 17.. Responsibility nog 
9. Potential for Growth 18. General Suitability 


TEST RATING FORM 
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APPENDIX 5 


Dunnette (1971) Table 1 


@ XICUSITA 


£ efdsT (LY@L) streanud 


Table 1 
Assessment Methods Showing High Correlations with Each of Eight 
Behavior Rating Factors and Overall Staff Prediction for College 
and Non-College Men in the AT&T Management Progress Study 


aI ar re rT ee i 
Assessment method College men Non-college men 


Factor I. General Effectiveness 


Performance in Cooperative Group Exercise .60 
Performance in Competitive Group Exercise 67 
Performance on In-Basket 60 59 
Interview: Personal Impact sy) 48 
Projective: Leadership Role 48 51 
Personality Test: Dominance .33 Roe 
Factor II. Administrative Skills 
Performance on In-Basket -16 68 
Performance in Competitive Group Exercise 48 1 
Mental Ability Test 34 72 
Interview: Personal Impact 42 24 
Oral Communications Skills 33 53 
Projective: Leadership Role .36 .36 
Personality Test: Dominance -30 .30 
Factor III. Interpersonal Skills 
Performance in Cooperative Group Exercise 39 ey? 
Performance in Competitive Group Exercise 62 45 
Performance on In-Basket 45S 49 
Interview: Personal Impact 44 25 
Human Relations Skills .28 46 
Factor IV. Control of Feelings 
Performance in Competitive Group Exercise 47 36 
Performance in Cooperative Group Exercise 37 -35 
Interview: Human Relations Skills -23 45 
Tolerance of Uncertainty 30 -40 
Projective: Leadership Role .29 46 
Dependence —-.28 —42 
Factor V. Intellectual Ability 
Mental Ability Test .10 62 
Interview: Oral Communications Skills 40 -47 
Factor VI. Work Orientation Motivation 
Projective: Work or Career Orientation .50 -56 
Interview: Personal Impact -36 -50 
Inner Work Standards 40 -43 
Performance in Cooperative Exercise : 30 -39 
Performance in Competitive Exercise 45 -36 
Performance on In-Basket - 44 -26 
Factor VII. Passivity 
Interview: Need Advancement Si) —.67 
Personal Impact —.38 —.58 
Need Security -50 37 
Projective: Leadership Role ~.47 —.40 
Achievement Motivation —.44 -.50 
Performance in Competitive Exercise Siare3.9 —.36 
Performance in Cooperative Exercise - =.35 —.34 
Personality Test: General Activity - —.43 
Factor VIII. Dependency | 
Projective: Affiliation -46 41 
Dependence 49 as/ 
Overall Staff Prediction 
Performance in Competitive Exercise .60 38 
Performance oa In-Basket 55 51 
Performance in Cooperative Exercise 41 42 
Interview: Personal Impact 49 pail 
Oral Communications Skills 41 48 
Projective: Achievement Motivation 30 40 
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APPENDIX 6 


Principal Components Factor Analysis (Varimax Rotation) 


of Characteristics Appraised in Test Rating Condition 
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APPENDIX 6a 
Principal Components Factor Analysis (Varimax Rotation) 
of 18 Characteristics Appraised in Test Rating Condition: 


Clinician #1 (N=74) 


Appraised Factor  Faetor Factor Factor Factor 


Characteristic nl 2 3 4 5 oe 
i .83 ~.09 = 05 07 09 .70 

2 .22 16 at 05 eal 46 

3 .38 77 Sah 18 06 79 

l ~210 10 “08 83 S108 sn 

5 2h ge 26 = 08 37 74 

6 EG 76 £19 09 26 am 

7 ~.04 20 38 19 62 60 

8 77 11 ate ~.20 Ania .68 

9 8h 29 .18 10 05 83 

10 82 13 20 ale -.20 .78 

‘ul . 86 .06 = 106 80 -~.003 Rie 

12 12 05 39 76 17 77 

we 02 103 85 06 07 74 

14 05 .80 (oe 24 12 ap 

15 12 72 32 Ayes 01 75 

16 .20 23 aor 03 73 62 

17 10 20 8h 12 hei .78 

18 47 41 538 001 12 80 
Variance 3.96 3.32 2.45 161 1-56 12.91 
ean 22% 18% 14% 9% 9% 72% 
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APPENDIX 6b 
Principal Components Factor Analysis (Varimax Ratation) 
of 18 Characteristics Appraised in Test Rating Condition: 


Clinician #2 (N=7%) 


Appraised Factor Factor Factor Factor Factor H*2 
Characteristic i 2 3 4 5 

il 76 02 -.03 14 32 70 

2 25 22 =.23 .68 08 64 

2 82 -.02 04 ak -.07 72 

4 -.002 05 14 O04 83 71. 

5 a ked) -.09 25 71 SO) 60 

6 -.27 46 38 36 -.12 aM! 

7 04 2 . 80 -.06 -.005 10 

8 66 20 oak 05 -.37 67 

2) 7h 45 13 14 -.13 76 

10 ip 1 -.08 -.05 -.29 76 

ome 85 ~.03 605 08 05 73 

ae 1 68 apie) 07 09 S32 

13 -.20 56 -.20 -.13 53 69 

14 -.008 -.04 oie £9 O4 61 

ES 05 36 aval 70 -.08 67 

16 Ale 009 79 07 05 65 

Ly 25 76 -.10 18 =.03 69 

18 41 BOO . 30 soe .09 agli 

Variance S.87 Zon Dot HERS) 1.36 125.8 
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APPENDIX 6c 


Principal Components Factor Analysis (Varimax Rotation) 


of 17 Characteristics Appraised in Test Rating Condition: 


Appraised Factor 
Characteristic ip 
ul =. 07° 
2 -.10 
cs 
3) 48 
6 -18 
i) ales 
8 06 
g -.04 
10 05 
Ie 14 
£2 66 
13 mle 
14 66 
LS 57 
16 20 
Ey -.04 
18 -48 
Variance 2.44 


mio Total 
Variance 


Clinteian #3 (Nz75) 


Factor 


2 


Factor Factor Factor H*2 
3 4 5 
i 66 14 53 
-.04 69 -.21 54 
10 -.10 -.68 2 
30 di, = .45 62 
as} -.08 -.26 76 
¥D =o] 10 63 
-.09 -.05 10 61 
31 ums) -.18 ts 
ok =< L0 -.08 517] 
02 34 iy Do 
23 a Oo O4 Sul 
5g 32 24 Se) 
Sy ey, =.15 -.002 Heke) 
-.14 41 37 65 
ew ser Bl 61 55 
Alsi) <7 -.26 O60 
5 31 Seyi} 66 
2.24 Leo oro AMO PSAs 
13% 9% 9% 60% 
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APPENDIX 7 


CAPSULE SUMMARY OF TESTS 
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APPENDIX 7 
CAPSULE SUMMARY OF TESTS 

Listed below is a short description of each of the tests used 
in this study. For complete information regarding a specific test, the 
reader is referred to the appropriate test manual. 
Differential Aptitude Tests 

Verbal Reasoning. This is a verbal concept understanding test. 
It is designed to evaluate the ability to abstract, generalize, and 
to think constructively. Testing format involves verbal analogies. 

Abstract Reasoning. This is a non-verbal reasoning ability test. 
The testee is required to formulate operating principles in changing 
abstract diagrams. Operating principles involve the use of logic. 
Wonderlic Personnel Test 

The Wonderlic is a test of mental ability. It is widely used 
as a selection tool in hiring and as an indicator of future develop- 
ment possibility. 


Watson-Glaser Critical Thinking Appraisal 


This test involves the appraisal of important critical thinking 


skills (inference, recognition of assumptions, deduction, interpretation 


and evaluation of arguments) in everyday situations. 


Business Judgement Test 


This test is designed to measure empathy and knowledge of generally 


accepted ways of behaving in business interpersonal situations. 
Test of Practical Judgement 

This test is designed to evaluate the testee's ability to select 
the best solution to factual and complex interpersonal business 


problems. 
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Supervisory Practices Test 


This test is designed to appraise supervisory ability or potential 
ability. It is directly concerned with supervisory thinking, attitudes, 
and opinions. 

Management Aptitude Inventory 

This inventory is nee ened to assess characteristics related 
to success in managerial positions (intelligent job performance, 
leadership qualities, proper job attitude, and relations with others). 
Vocational Preference Inventory 

This is personality questionnaire which uses preference for vocat- — 
ional titles as a measure of personality style. It is designed to 
assess areas such as interpersonal relations, interests, values, self- 
conception, coping behavior, and identification. 

Edwards Personal Preference Schedule 

This personality test provides a convenient measure of normal 
personality variables such as achievement, deference, order , exhibition, 
autonomy, affiliation, intraception, succorance, dominance, abasement, 
nurturance, change, endurance, heterosexuality, and aggression. 
California Psychological Inventory 

This is a multiple choice personality test which measures 18 
personality variables in four general areas (measures of poise, 
ascendancy, and self assurance; measures of socialization, maturity 
and responsibility; measures of intellectual potential; measures of 


personal orientation and values). 
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